Programmable Logic Devices in Experimental Quantum Optics by Stockton, J. et al.
ar
X
iv
:q
ua
nt
-p
h/
02
03
14
3v
1 
 2
8 
M
ar
 2
00
2
Programmable Logic Devices in Experimental Quantum Optics
J. Stockton,∗ M. Armen, and H. Mabuchi
Norman Bridge Laboratory of Physics 12-33, California Institute of Technology, Pasadena, California 91125 USA
(Dated: October 25, 2018)
We discuss the unique capabilities of programmable logic devices (PLD’s) for experimental quan-
tum optics and describe basic procedures of design and implementation. Examples of advanced
applications include optical metrology and feedback control of quantum dynamical systems. As a
tutorial illustration of the PLD implementation process, a field programmable gate array (FPGA)
controller is used to stabilize the output of a Fabry-Perot cavity.
I. INTRODUCTION
Automatic controllers are pervasive in experimental
physics. Servos typically play a role behind the scenes,
stabilizing environmental conditions (e.g. temperature,
frequency and amplitude of driving lasers) for the phys-
ical system of primary interest (e.g. quantum dots,
trapped atoms or molecules). But the system of interest
can itself be the explicit object of sophisticated control
strategies. An increasing number of experimental quan-
tum systems are developing to the point where coherent
dynamics occur at a time scale longer than that of avail-
able detectors and actuators [1, 2, 3]. This separation of
time scales opens the door for real-time feedback control
to be applied in quantum-mechanical scenarios.
New theoretical and experimental tools will be re-
quired to achieve quantum control objectives. Concerted
efforts are currently being made to extend classical con-
trol theory to quantum problems where back-action can-
not be ignored [5, 6]. Given the inherent nonlinearity
of conditional quantum dynamics, optimal control laws
cannot be practically implemented with analog circuits,
necessitating fast digital control. Even for linear systems,
programmable logic may be superior to analog methods
when a precisely shaped transfer function is desired. For
these reasons, one expects that programmable logic de-
vices (PLD) with high processing speed and low latency
will prove to be invaluable as quantum and classical con-
trollers.
PLD’s are already a standard tool in industry and some
areas of science, but they have yet to attain widespread
use in fields such as quantum optics and quantum infor-
mation science. Our aim in this paper will be to convey
a base level of knowledge required to use these devices in
representative experimental setups. First, we motivate
the use of programmable logic with some potential ap-
plications. We then describe the details of practical im-
plementation, from determining the required hardware
specifications to completing the design flow. Finally, we
demonstrate this process with a familiar example of clas-
sical optical control by using a Field Programmable Gate
Array (FPGA) to lock a Fabry-Perot cavity.
∗Electronic address: jks@caltech.edu
II. APPLICATIONS
An outstanding feature of PLD’s is that they can im-
plement complex non-linear logic with relatively low la-
tency. Here ‘latency’ refers to the delay between the
time that a signal is received as input and the time that
a calculation based on it becomes available as output.
This reaction time is of little consequence in many data-
processing applications, but is critical in control loops.
The control bandwidth of any servo is limited by the in-
verse of this delay.
In addition, most PLD’s can be completely re-
programmed in a matter of minutes, allowing for a high
degree of design flexibility in experimental situations.
Given a PLD with these capabilities, it is not difficult
to imagine a variety of control applications related to
quantum optics. Here we summarize a few potential ex-
amples, some of which are currently being developed.
A. Precise linear servos
In linear control tasks, PLD controllers have a distinct
practical advantage over analog circuitry with regard to
precision and flexibility. For example, it is a well known
control problem to stabilize a plant over one of its reso-
nances. An appropriate controller should precisely com-
pensate the measured center frequency and quality factor
of the resonance. When creating an analog servo the de-
signer must work with discrete components (resistors, ca-
pacitors, etc.) whose impedances have a non-negligible
error range. However a PLD transfer function can be
specified digitally, making it much easier to closely match
the system dynamics.
Figure 1 shows the near-compensation of a harmonic
oscillator (HO) resonance with a PLD ‘anti-harmonic-
oscillator’ (AHO) transfer function. (Actually, both
transfer functions in the graph are implemented with
a PLD by techniques described later.) Ideally, the HO
transfer function will be transformed into an integrator
transfer function (with a constant -90 degrees of phase)
when multiplied by the AHO compensator. The devia-
tion from a perfect integrator is due to a slight error in
the assumed damping. Refinements to the AHO design
could remove this non-ideality.
PLD’s will obviously not replace every linear servo in
2102 103 104
30
20
10
0
10
20
30
M
ag
ni
tu
de
 (d
B)
102 103 104
 200
 150
 100
 50
0
50
100
Ph
a
se
 (d
eg
)
Frequency (Hz)
HO
HO
AHO
AHO
HO*AHO
HO*AHO
FIG. 1: The blue plot is a harmonic oscillator (HO) trans-
fer function and the red plot is the anti-harmonic-oscillator
(AHO) transfer function. The product of the two should re-
semble an integrator transfer function (green) with a constant
-90 degree phase.
the typical laboratory, but the ability to optimize the
stability of critical laser systems (for example) is a con-
siderable resource. We detail the use of a PLD controller
to optimally perform a linear control task in a later sec-
tion.
B. Optimal measurement
In quantum feedback scenarios, either the measure-
ment operators or the system Hamiltonian can be mod-
ulated in real time according to the information gained
from a continuous measurement record.
Consider the case where only the measurement oper-
ators are adjusted. The goal of the entire measurement
may be to most accurately determine the initial state of
the system. Other situations may call for the measure-
ment of only a single state parameter, where all other
state variables are either assumed or neglected. The
authors are currently developing a system of this type
where the goal is to optimally measure the phase of a
single pulse of light. We constrain ourselves to measur-
ing pulses that are long enough to have their phase be
well defined and also long enough to allow us to feedback
the measurement signal multiple times before the pulse
has been completely destroyed by the detectors.
Wiseman et al. have determined close-to-optimal mea-
surement schemes for this system based on quantum tra-
jectory theory [6]. In short, they consider the signal to
be measured in an adaptive homodyne set-up where the
pulse is mixed with a strong local oscillator whose phase,
Φ, is continuously adjusted (within the duration of each
pulse) according to the measured homodyne current, I.
To first order, the job of the algorithm is to lock to the
side of the interference fringe, thus Φ is adjusted until I
is zero.
Despite this simplistic description, the general optimal
algorithm (f : I ⇒ Φ) is a highly non-linear function
based on state estimation. It has been shown that the
estimated state at any time is a function of only two pa-
rameters and the initial conditions. In terms of a scaled
time v, the parameters are
Av =
∫ v
0
I(u)eiΦ(u)du (1)
Bv = −
∫ v
0
e2iΦ(u)du (2)
The phase of the local oscillator is usually taken to be
Φ(v) = φˆ(v) + π2 where φˆ(v) is the phase estimate to
be used during the course of feedback. If one were to
stop the feedback at any time, the best phase estimate
would be φˆC(v) = arg(Cv) where Cv = Avv + BvA
∗
v.
However, for subtle reasons, φˆC(v) should not be used as
the estimate during the course of the feedback.
One simple algorithm uses φˆ(v) = arg(Av). With this
choice, the algorithm simply reduces to a gain-scheduled
integrator of the form
dΦ(v) =
I(v)√
v
(3)
where v is the time since the beginning of the pulse and
the
√
v factor represents the effective gain. Currently,
this algorithm is being implemented with an FPGA that
creates the
√
v gain factor with a look-up table represen-
tation of the function as described in a later section.
More sophisticated algorithms (with optimal perfor-
mance for certain squeezed states) have been proposed
that use feedback of the form
φˆ(v) = arg(C1−ǫ(v)v A
ǫ(v)
v ) (4)
where ǫ(v) is also a function of Av and Bv. In this case,
the algorithm is sufficiently complex that any analog im-
plementation would be extremely difficult to design.
In any case, the non-linear, low latency behavior of
PLD’s suggest that they are a suitable tool for this task.
Given that the form of a desired algorithm may change
frequently with the introduction of realistic experimental
complications, the rapid prototyping allowed by a PLD
is also extremely convenient.
C. Feedback control
When the goal is control rather than optimal measure-
ment, a non-trivial Hamiltonian of the system will be con-
trolled by the measurement record. Consider the case of
an atom drifting through the light field of a small Fabry-
Perot cavity. As has been demonstrated, the position of
the atom may be imprinted onto the output light of the
cavity [3]. This information can potentially be mapped
back onto the intensity and phase of the input laser with
3the goal of trapping the atom in the cavity for extended
periods of time [4].
Optimal control of the atom’s position will require
a complex predictor-corrector structure in the feedback
loop at µsec time scales. If the associated calculations
can be sufficiently reduced, a PLD with effective clock-
ing speeds above a MHz will be able perform this task. Of
course, the effectiveness of the control algorithm will de-
pend on the assumed dynamics of the system from which
it is derived. If the system needs to be described quantum
mechanically, we should institute a conditional quantum
state estimator. If a classical description is sufficient, we
can use a less complicated algorithm. The performance
of different controllers will be a strong indicator of the
validity of our descriptions. The ability to quickly re-
design the PLD will be particularly advantageous when
exploring this boundary.
Hamiltonian feedback can also be used to manipulate
the internal states of atomic and molecular systems. Nu-
merous groups have become interested in shaping fem-
tosecond laser pulses to drive transitions which may be
inaccessible using traditional means [8]. This includes
the ability to synthesize rare molecular compounds. For
example, by iteratively reading the fluorescence spectrum
of the system and intelligently moving in the parameter
space of the pulse shape, one attempts to land at a shape
conducive to creating the desired state or compound.
This procedure can happen in two regimes, ‘learning
control’ or ‘feedback control’. For learning control we
consider using a new sample for every pulse, whereas for
feedback control we consider using the same sample on
every pulse. In the latter case, the algorithm assumes
that the sample has a long enough dephasing time (mem-
ory) that a significant degree of coherence is retained be-
tween pulses. For either case, especially the second, a
PLD based controller may have significant advantages
over alternative controller architectures.
D. Decision and control for quantum information
processing
In a generic quantum computing architecture, there
exist classical logic steps which involve performing a co-
herent quantum operation conditioned on the result of
a measurement. For example, quantum error correct-
ing codes can combat decoherence by mapping measured
errors to appropriate correction operators [7]. In an ex-
periment, this measurement-operation procedure should
be performed much faster than the dephasing rate of the
system. If the operations can be performed quickly upon
command, PLD’s will be able to orchestrate these codes
in a reliable and reconfigurable fashion with minimal de-
lay.
Even for non-conditional algorithms, PLD’s can
streamline the implementation of complex instruction
sets. In particular, groups working on ion trap com-
puting have developed means of performing entangle-
ment algorithms [1], but with an extensive overhead of
macroscopic equipment that requires detailed manual ad-
justment whenever the algorithm is changed. Without
pushing its computational limits, a PLD can be made
to streamline such logic networks. By using software de-
fined algorithms, the users eliminate the time and risk
of error associated with manual realignment of network
components. Commercial magnetic resonance systems
use PLD’s for similar reasons.
As quantum computing architectures grow to the point
where conditional and non-conditional algorithms must
be integrated in a way that is fast and flexible, pro-
grammable logic will be able to handle the task in a
convenient manner.
The success of any PLD controller will depend on its
dynamic range and effective bandwidth. Next we dis-
cuss in more practical terms what levels of system perfor-
mance can be reasonably expected from currently avail-
able PLD’s.
III. DESIGN
A. Hardware
Once it is determined that a control algorithm needs to
be implemented digitally, a designer is confronted with
a wide array of possible controllers and corresponding
acronyms. In addition to PLD’s, the options include con-
ventional microprocessor systems, DSP’s (digital signal
processors), and ASIC’s (application specific integrated
circuits). Of course, the choice of controller is highly
dependent on the algorithm being implemented because
each device has its own trade-offs. Microprocessor sys-
tems are general enough to allow for a simple means of
programming complex algorithms. However, these sys-
tems rely on a single bus architecture which forms a
significant bottleneck in signal processing applications.
Overall throughput may be high, but a large delay lim-
its typical controllers to slow applications with kHz scale
bandwidths. In addition, unreliable operating systems
may present undesirable interrupt signals during critical
stages of processing. DSPs are specialized microproces-
sor systems with a multiple bus design that are optimized
for signal processing applications. Due to their parallel
architecture, DSP’s can attain low-latency performance,
but require a significant degree of high-level design exper-
tise. ASIC’s are like PLD’s in that the user designs them
from the gate level, but ASIC’s are irreversibly hard-
wired with a single application. While PLD’s generally
have fewer resources available than ASIC’s, they offer an
efficient parallel computation structure along with repro-
grammability and a relatively simple design process [9].
The market for PLD’s is currently dominated by two
companies: Xilinx and Altera. Devices from both compa-
nies have had extensive product development in industry,
thus a substantial support network is available to design-
ers. In choosing between PLD companies, several fac-
4tors beyond the chip performance need to be considered,
including the quality of the associated software environ-
ments. To obtain the maximum control bandwidth, we
chose to work with a Field Programmable Gate Array
(FPGA) from Xilinx.
The logic structure of a Xilinx FPGA is designed to
handle arbitrary algorithm architectures. The FPGA
mostly consists of a grid with thousands of Configurable
Logic Blocks (CLB) connected by programmable inter-
connections. Each CLB contains a few small look-up
tables which can serve as a simple logic elements (AND,
OR, etc) when programmed. Also interspersed in this
grid are larger blocks of RAM that can be programmed
as user defined functions with a large domain and range.
Since each logic element needs to be triggered to operate,
the distribution of a uniform clock signal with constant
frequency and phase is a considerable design issue. Thus
FPGA architectures commonly have digital clock man-
agers (DCM) or delay locked loops (DLL) that de-skew
the clock signal across the device.
The performance of FPGA architectures has been im-
pressively increasing in recent years. To give a current
indication of their level of performance, we quote some
of the characteristics of one of the top of the line de-
vices available on the market today. The Xilinx Virtex
II can contain up to 10 million system gates and have
an internal clock frequency (fC) up to 420 MHz. The
input-output speed can be above 840Mb/s which roughly
matches the maximum speed of the best analog to dig-
ital converters (100 MSPS for a 12 bit sample Analog
AD9432). This same FPGA has up to 192 SelectRAM
blocks of 18 kbit each. Because a strong demand from
industry drives the development of FPGA technology,
these performance specifications will likely improve sig-
nificantly in the short term future.
Of course these devices must be coupled to a board, in-
troducing other practical issues. The system used in the
cavity lock described below is a GVA-290 board (G.V. &
Associates) with two Xilinx Virtex-E XCV1000E FPGA
chips. Signals enter and exit the board through four in-
put and four output SMA connectors. The signals are
digitized by an ADC (Analog AD9432) at the input and
converted back to analog by a DAC (Analog AD9762)
at the output. Each ADC is located on a detachable
daughter board, allowing for converter upgrades and the
addition of customized components and filters. Both the
ADCs and DACs have 12 bit resolution and are driven at
the clock speed of 100 MHz. A crystal oscillator provides
the clock signal to the FPGA, which distributes a syn-
chronized signal internally with DLLs and also outputs
the driving signal for the ADC and DAC at a controlled
phase. Unlike standard models, the board was ordered
with DC coupled inputs, allowing us to have broadband
control to DC. Boards often come with anti-aliasing ana-
log filters, but were not included here due to the sub-
stantial group delay a high-order filter can impose on the
signal. The cost of this particular board including de-
vices is approximately $10,000, but it should be stressed
105 106 107 108
−70
−60
−50
−40
−30
−20
−10
0
10
M
ag
ni
tu
de
 (d
B)
105 106 107 108
0
50
100
150
200
D
el
ay
 (n
s)
Frequency (Hz)
FIG. 2: The amplitude response and delay of the entire GV-
290 board (ADC → FPGA → DAC). Notice that the delay
below the Nyquist frequency (fC/2 = 50 MHz) is ∼ 160 ns.
The phase response in the constant delay region is linear with
slope proportional to the delay.
that functional systems could be assembled at far less
cost.
Xilinx also offers a special academic program through
which university researchers can obtain the necessary
software environment and a limited range of hardware
products.
We can now discuss the latency and throughput of our
controller in more detail. The latency is defined as the
amount of time for an algorithm to process a single sam-
ple all the way through. The throughput is defined as
the number of samples (or bits) per second being output
from the device. For example, consider a system of N
components in series, each with the same sampling rate
f = 1
τ
. Also assume the system is ‘pipelined’ meaning
that a new sample is loaded every τ seconds and sam-
ples are registered (values held) in-between components.
In this case, the latency is Nτ , while the throughput is
f . If this were a controller, the bandwidth of control
would be limited to the inverse of the latency 1
Nτ
, not
the throughput.
One of the principle advantages of FPGA technology
is that the delay can be quite small. Consider the case
where the FPGA of the GVA-290 board is programmed
to pass a signal through without any manipulation. Fig-
ure 2 shows the transfer function and delay of this config-
uration. The ADC, FPGA, and DAC are all clocked at
100 MHz and each one takes a certain number of cycles
(10 ns/cycle) to perform its function. The ADC imposes
a delay of 10 cycles, the buffers of the FPGA impose a
delay of 4 cycles, and the DAC only delays the signal
about 1 cycle. Adding all this to a small delay from
other components, we find that below the Nyquist fre-
quency (fC/2 = 50 MHz) the signal passes through at
unity gain with a constant overall delay of ∼ 160 ns.
Thus the maximum control bandwidth for this device is
5∼ 6 MHz, and bandwidths in the tens of MHz may be
anticipated with newer versions. If the FPGA algorithm
is simple enough that the ADC dominates the delay, it
may be desirable to use Flash ADCs that have less la-
tency at the expense of a larger power consumption and
smaller number of output bits.
If the FPGA performs a complex calculation that re-
quires multiple logical steps in series, the delay is in-
creased by an integer number of cycles and the effective
bandwidth suffers. A typical example is that of the FIR
filter mentioned below where, for BU input bits, the sam-
pling rate becomes fC/BU . For any general algorithm,
care should be taken to minimize the number of serial
elements before implementation. If possible, calculations
should be performed in parallel and look-up tables should
be used to evaluate complicated functions.
B. Software
The design process for a particular algorithm has been
largely automated with implementation software environ-
ments like Foundation ISE (Xilinx). Once the design
is entered via one of the options described below, the
program steps through a series of compilation tasks be-
fore downloading onto the device. In order, the design is
analyzed for syntactic errors, synthesized into a generic
circuit, and implemented into an optimal bit stream ap-
propriate to the particular device and board. The bit
stream is then downloaded onto the device to achieve a
stand-alone realization of the desired algorithm. Simu-
lation programs are available at intermediate stages for
debugging purposes. The latest version of Foundation
ISE (4.1) compiles up to 100,000 gates/min. For reason-
able designs, an entire design flow can be expected to
take about 10 minutes. This allows for a rapid prototyp-
ing cycle which is one of the most desirable features of
this technology.
Numerous algorithm entry options are available. Us-
ing a library of primitive components, one can create a
schematic of the desired circuit. Abstract finite state ma-
chine diagrams can also be interpreted. The third option
is a text based design written in either Verilog or VHDL
(VHSIC Hardware Design Language).
As is common in technology standards, the choice of
Verilog vs. VHDL has become a religious one for every-
day practitioners. It is worth pointing out some of the
accepted differences between the languages. Verilog is
generally regarded as being easier to learn. A strong ma-
jority of engineers implementing commercial systems use
Verilog. Historically, VHDL was meant as a description
language before being adopted as a means of synthesis.
As a result, VHDL is a much more strongly ‘typed’ lan-
guage. The range of abstraction is also different between
the two languages. Although there is a considerable over-
lap, Verilog extends to a lower level of abstraction while
VHDL extends to a slightly higher level. For non-critical
reasons, we chose to design in VHDL, hence we will dis-
cuss the following designs in those terms. However, the
discussion is abstract enough that most concepts apply
to both languages.
To first order, VHDL is a text based description of a
schematic design. The mapping between input and out-
put bus variables consists of a series of abstractly defined
components where output ports are connected to input
ports with defined signal variables. Each component has
an associated ‘entity’ and ‘architecture’, where an archi-
tecture is an instantiation of an entity. For example, a
component with entity ‘op-amp’ (with only input and
output ports defined) could have its functionality deter-
mined by the particular architecture ‘op27’. The internal
workings of a particular architecture are can be specified
in another VHDL file with more components that are de-
fined elsewhere. In this way, the code lends itself nicely
to nested level of detail and organized project design.
Also one can easily swap out components by changing
architectures, but not entities, within the code.
At some point in the hierarchy, primitive components
must be called upon. The Xilinx software offers an ex-
tensive library of such components (AND, OR, etc.) for
use with each particular device. In addition to these ba-
sic primitives, one can also create more complicated, but
commonly used, components with the Xilinx ‘Core Gen-
erator’. These objects (adders, multipliers, filters, DSP
elements) can be customized with user specified param-
eters.
Each component loads inputs and returns outputs trig-
gered by an input clock signal. Hence, when designing
in VHDL one thinks in terms of circuit diagrams where,
on every clock cycle, events happen concurrently across
the device. On the other hand, in traditional C-like com-
puter languages events progress in a serial manner. At
times, serial logic is convenient and in fact VHDL of-
fers a restricted form of serial logic in a form known as a
‘process’. These processes are bits of C-like code that ex-
ecute when triggered. Inside a process, variables can be
manipulated with functions defined in other VHDL files.
However, a signal can only be changed once within a pro-
cess. For this and other reasons, processes are best used
as referees to generate secondary triggering signals and
logic. While processes can perform some level of math,
the heavy lifting is best left to the components which
have been streamlined for such purposes.
An appropriate use of a process is to initialize param-
eters and control timing. For example, Figure 3 demon-
strates how the simple adaptive phase algorithm men-
tioned above is implemented. Both the VHDL and an
equivalent schematic are shown. The photocurrent, I,
enters the device and is multiplied by the time dependent
gain factor, G(t) = 1√
t
, which is created by sending the
time signal, t, through a look-up table (described below).
The resulting signal, dΦ(t) = I(t)√
t
, is then sent to one port
of an adder, with the other input port being wired to the
output signal Φ(t). Because the output is connected to
the input with a delay, the adder serves as an integrator
and executes the relation Φ(t) = Φ(t−1)+dΦ(t) at every
6LUT Process
Time
G(t)
I(t)
X
dφ(t)
+
Reset
φ(t)
  
VHDL Equivalent (the symbol -- precedes comments)  
  --first component is the look-up table 
  --component format is 'instance: type'
  --port map plugs signals into component ports;  _# is label for bit size of bus
  lut_num1 : ramblock_core     
    port map (EN=>vcc_sig, WE=>gnd_sig, RST=>gnd_sig, CLK=>clksys, 
       ADDR=>time_8,DO=>Gtime_16,DI=>Gtime_16);
  multiplier_num1 : multiplier_core  
    port map (A => I_12, B => Gtime_16, CLK => clksys, P => dphi_28);
  --trim signal back down to size
  dphi_12 <= dphi_28(27 downto 16);  
  adder_num1 : adder_core
    port map (A => dphi_12, B => phi_21_a, Q => phi_21_b, CLK => clksys);
  
  --plug signals together 
  phi_21_c <= phi_21_b; 
  --start process on clock change
  PROCESS(clksys)
  VARIABLE time : integer;
  BEGIN
    --trigger on rising edge of clock
    IF  clksys='1' AND clksys'EVENT THEN
     IF time < tau_experiment THEN 
        phi_21_a <= phi_21_c;
        phi_12 <= phi_21_c(20 downto 9);
     ELSE 
        --zero signals during dead time
        phi_21_a <= "000000000000000000000"; 
        phi_12 <= "000000000000";
     END IF;
     IF time = tau_experiment+tau_dead THEN
       time := 0;
     END IF;
     time := time+1;
     --convert variable to signal
     time_8 <= int_to_bus(time);
    END IF;
  END PROCESS;
FIG. 3: FPGA schematic and corresponding code for the
adaptive phase measurement algorithm. In the schematic the
process is not represented as a block component because it is
coded in a serial manner.
time step. The ‘process’ plays an important role in this
algorithm by initializing the integral value and creating
the time signal. At the beginning of the pulse (integra-
tion), the process initializes t and Φ to zero. Every sub-
sequent clock signal, the process increments t by one and
lets the adder integrate up the signal. At the end of the
pulse, the process waits for the next pulse then repeats
the sequence. Figure 4 shows the algorithm in action.
Through the integrator structure, Φ is adjusted until I
is locked to zero. The overshoot is a result of the FPGA
delay.
A single measurement using this algorithm is shown in
Figure 4. Here the ‘pulse’ is a 50 µsec long time slice of a
weak cw coherent beam. The feedback algorithm is sam-
pling at 100 MHz with a delay less than 1 µ sec. Because
of the delay and other bandwidth limiting components in
0 20 40 60 80 100
−2
−1
0
1
2
3
4
5
6
Time (µsec)
Φ
 
(ra
d);
 I (
arb
.) Φ
I
FIG. 4: The Φ(t) and I(t) trajectories for the phase measure-
ment of a single pulse of light. The current is locked to zero
and the ending point of the phase is a rough estimate of the
measured phase. The true phase measurement is a functional
of both traces. The small oscillations are due to the delay in
the loop.
the loop, our effective feedback bandwidth is limited to
∼ 1 MHz.
As will be demonstrated below, Matlab plays a com-
plementary role in the design process. It can be used to
create the necessary coefficients and memory blocks used
as parameters in the VHDL components. In particular,
the Control and DSP toolboxes provide relevant func-
tionality. Also, Simulink is a good tool for simulating
the associated experiments, where delays and other re-
alistic factors can complicate the dynamics. There exist
software packages that attempt to directly translate from
a Simulink design of an algorithm into equivalent VHDL,
but these packages remain in early stages of development.
Due to their extensive utility, RAM look-up tables and
filter components are worth discussing in greater detail.
1. Look-Up Tables
Most FPGA chips come equipped with large blocks of
internal RAM that can be used as generalized functions
or look-up tables (LUT). Given an amount of memory on
a particular block, the user can decide on a certain num-
ber of input and output bits. During operation the RAM
block returns the value held at the address specified by
the input, effectively implementing the desired function.
For example, on the XCV1000E, 160 blocks of 4096 =
212 bits are available for internal use. (As noted above,
the Virtex II devices have much larger 18 kbit blocks.)
To make one block behave as the function f with Bi
input bits, the designer would choose the output to be
Bo = 2
12−Bi bits. Possible partitions are (Bi, Bo) ∈
[(1, 2048), (2, 1024), (3, 512)..., (8, 16), ...(12, 1)]. Once a
partition is chosen, the designer would use Matlab to
define a block of data consisting of 2Bi values each of
7size Bo bits, and use this block of data as a parameter
in the VHDL LUT component. If the discretization is a
problem, more RAM blocks can be used to represent the
function. If desired, the memory of a RAM block can
also be dynamically written during operation. With this
ability, an algorithm could easily adapt itself according
to the signals it receives. Both the read and write op-
erations (from/to one RAM address) only take a single
clock cycle.
As mentioned above, these LUT functions play an ex-
tremely important role in speeding the functionality of
non-linear algorithms. The application may be as simple
as non-linear gain-scheduling of a controller or as compli-
cated as full quantum-mechanical state estimation with
the LUT performing functions based on assumed system
parameters. In general, it is a matter of judgment how to
partition complex algorithms, but any optimal partition
will likely involve the use of these LUTs to perform the
difficult parts of the calculation with minimal time delay.
2. Filters
PLD’s have a clear edge over analog circuitry in non-
linear processing, but they also have a potential advan-
tage in implementing precise, generic linear filters and
transfer functions.
A standard core element offered by Xilinx is the FIR
(Finite Impulse Response) filter. The FIR is defined in
discrete time as
y(n) =
N∑
i=0
a(i)u(n− i) (5)
where y(n) and u(n) are the output and input at the
discrete time n respectively. With standard Matlab
functions (firls, remez) one can specify an arbitrary
amplitude response and get out the corresponding a(i)
vector. The sampling frequency for a FIR element is
fF =
fC
BU
= 1
τF
where BU is the number of bits chosen to
represent u(n). Of course, the filter is useless at shaping
the response above this frequency. The group delay of
the signal through the filter is approximately τF
N
2 .
The range of attenuation is also a concern in the de-
sign of any filter. For an FPGA with BF bits entering
and leaving, the dynamic range is 20 log(2BF )dB. For our
board with 12 bit ADC/DAC inputs and outputs, this
corresponds to 70 dB. The designer should also have a
sense of the size of the input and output signals. If the
input signal is too high, the FPGA will rail; if the input is
too low, it will fail to rise above the smallest bit size. To
avoid these types of problems, broadband gain elements
can be used at the input and output of the FPGA board.
A drawback of the FIR design is that it cannot be used
to control the phase response of its transfer function. On
the other hand, a generic continous time linear transfer
+
FIR a T
T
T
FIR b
y
u_12
y_12
'au'
'-by'
FIG. 5: Implementation of IIR filter. ‘T’ components trim a
certain number of least significant bits from the data bus.
function
GC(s) =
c(N)sN + c(N − 1)sN−1 + ...+ c(1)
d(N)sN + d(N − 1)sN−1 + ...+ d(1) (6)
where YC = GCUC , has phase control built in through
the denominator. To approximate this function on a
PLD, an Infinite Impulse Response (IIR) filter needs to
be used.
One possible IIR design process illustrates this need.
To generate a digital IIR design, first create GC(s) using
standard control techniques (Nyquist, LQR, etc.). Next,
convert from a continuous to a discrete transfer function
GC ⇒ GD(z) = a(0) + a(1)z
−1 + ...+ a(N)z−N
b(0) + b(1)z−1 + ...+ b(N)z−N
(7)
with the Matlab function c2d. We have used the defi-
nition YD = GDUD in the discrete time representation.
Apply a z-transform (z−1 ⇒ unit delay) to create the
discrete time difference equation
y(n) =
N∑
i=0
a(i)u(n− i)−
N∑
i=1
b(i)y(n− i) (8)
with the definition b(0) = 1. Finally, implement the dif-
ference equation in hardware as in Figure 5 with 2 FIR
blocks and 1 adder.
With b(n > 0) = 0 the filter is just a FIR filter, how-
ever with b(n > 0) 6= 0 the output is fed back to itself.
Hence an impulse response will have an infinite effect on
the output. Of course, with internal feedback loops, the
system is potentially unstable to noise and rounding er-
rors. For this reason, among others, the Xilinx ‘Core
Generator’ does not create flexible IIR modules.
However, with careful consideration of the number of
bits required at each stage, a stable IIR filter can be
created as in Figure 5. The sampling frequency for this
simple architecture is fC2BY where BY is the number of
bits used to keep track of y(n) internally. The factor
of 2 results from the delay of both the adder and the
FIR element. Because of the feedback, the IIR filter can
achieve a given amplitude response with lower number
of coefficients than the FIR filter. This means the filter
delays the signal less. Even though the IIR has fewer
8FPGA
EOMVCO-AOM CavityPZT
T_lower
T_upper
FIG. 6: Feedback architecture for a Fabry-Perot Cavity. The
EOM puts sidebands on the beam necessary to generate the
locking signal. The FPGA algorithm T upper maps the error
signal to the fast VCO-AOM frequency shifting combination.
The FPGA algorithm T lower maps the signal to the slow
PZT.
coefficients than an analagous FIR filter, the coefficients
of the IIR filter have to be specified to a greater degree
of precision to achieve the same amplitude response.
IV. SPECIFIC EXAMPLE: CAVITY LOCK
We now discuss the use of an FPGA to perform a clas-
sical task necessary for low-noise experiments. High pre-
cision optical measurements demand laser intensity noise
be minimized as much as possible. In the adaptive phase
experiment mentioned above, the input laser is a Light-
wave Nd:YAG model 126 (1064 nm) with an inherent
broad relaxation oscillation noise peak at ∼ 100 kHz. To
perform broadband detection and control near 1 MHz,
this intensity noise must be removed from the beam with
a Fabry-Perot cavity.
A block diagram of the system is shown in Figure 6.
The output intensity of the cavity is stabilized with the
standard Pound-Drever-Hall method so that the error
signal is created from a reflected carrier beam with side-
bands. At low frequencies (below 100 Hz) the feedback
loop is dominated by a piezoelectric element (PZT) which
controls the length of the cavity. At higher frequencies
and through the closing point of the servo, the feedback
is from an AOM (Acousto-Optic Modulator) driven by a
VCO (Voltage Controlled Oscillator) which adjusts the
frequency of the input beam.
Given the control architecture of Figure 6, the design
process can be made very systematic with the flexibility
of the FPGA. Because the critical behavior of the servo
will be dominated by the VCO-AOM loop, we concen-
trate on the design of TU (T upper). First, the transfer
functions of the elements in the loop are measured. Here
we find that the VCO-AOM combination behaves like a
low-pass filter (TV ) with a corner at 100 kHz. The cavity
itself can be modelled as a low-pass filter (TC) with a cor-
ner at about 10 kHz (the cavity linewidth). The goal is
to design TU such that the closed loop transfer function
TCL =
TCTV
1+TCTV TU
is stable.
At this point, we can use the Matlab Control Tool-
box to design an optimal TU . One option is to provide
the function lqr with the state space representations of
TV and TC and an appropriate cost function to create the
optimal TU . The result simply tells us to make the combi-
nation TCTV TU behave like an integrator (TI =
1
s
= 1
jω
)
such that the controller satisfies the Nyquist criterion
with 90 degrees of phase margin.
There are practical problems with this approach. In
particular, the gain of TU must be infinite for very low
and very high frequencies. To remedy this, we flatten the
response of TU below 100 Hz (where the PZT arm takes
over) and roll off the response at 300 kHz, beyond the
closing point of the servo. So instead of making TU =
TI
TCTV
we use TU =
TLP1T
2
LP2
TCTV
where TLP1 is a low-pass
filter with the corner at 100 Hz and TLP2 is a low-pass
filter with the corner at 300 kHz.
To get high gain at frequencies below 100 Hz, we make
TL (T lower) behave as a low-pass filter with a corner at
only a few Hz. A better choice would be to implement TL
as a high-gain analog integrator, but we use the FPGA
to implement TL here for demonstration purposes.
Next, we generate proper IIR coefficients for both
paths by the method described previously, treating TL
and TU as the continuous transfer function GC . With a
clock frequency of 100 MHz and an internal sample size
of BY = 32 bits, the IIR structure had an effective band-
width of 1.5 MHz ( fC2BY ), which is adequate to generate
the critical features of the transfer function.
Figures 7 and 8 show the desired and actual transfer
functions of both arms. Each arm fails to match the
desired phase and amplitude response in a similar way.
First, because of the finite size of the sampling time, the
actual phase response differs from the desired response
as the frequency approaches the effective sampling fre-
quency. In fact, this mismatch happens lower than the
sampling frequency because of the delay of the IIR filter.
Second, at low frequencies, the FPGA gives less gain than
the desired result. This is due to the fact that we are
dealing with finite precision coefficients. The price paid
for having a large sampling frequency with small delay
is that we have less control over the size of the low fre-
quency gain. Finally, note that the PZT arm integrator
achieves the full 70dB of expected range (input/output
size is 12 bits).
The closed loop transfer function behavior for both
arms matches our expectations for noise rejection at low
frequencies. A mismatch at higher frequencies is due to
inadequate modelling of the PZT and other components.
(The PZT behaves more like a collection of oscillators
with different resonances than a low pass filter.) Quali-
tatively, the FPGA lock was much more robust to high
frequency noise than an analog version of the servo. This
was likely due to the precise match to the plant dynam-
ics near the unity gain point of the servo, achieved by
the use of large FIR coefficients. However, the FPGA
9101 102 103 104 105
−20
0
20
40
60
80
100
120
M
ag
ni
tu
de
 (d
B) FPGADesign
101 102 103 104 105
−120
−100
−80
−60
−40
−20
0
Ph
as
e 
(de
g)
Frequency (Hz)
FIG. 7: Bode plot of T lower (transfer function leading to
PZT). The design is a low-pass filter which dominates control
below ∼ 100 Hz.
101 102 103 104 105
20
30
40
50
60
70
M
ag
ni
tu
de
 (d
B) FPGADesign
101 102 103 104 105
−150
−100
−50
0
50
Ph
as
e 
(de
g)
Frequency (Hz)
FIG. 8: Bode plot of T upper (transfer function leading to
VCO-AOM). The peak in phase is designed to stabilize the
plant through the unity gain point.
lock was unable to retain the lock over time scales more
than a few hours due to the saturated gain at very low
frequencies. This problem could easily be remedied by
using an analog integrator with more DC gain to replace
the FPGA PZT transfer function. The main advantage
of the FPGA is its fast accurate response and, besides
the demonstration presented here, there is no practical
reason to use the FPGA for high-gain, low-frequency ap-
plications.
Finally, another feature of FPGA control is the possi-
bility of adding logical automation to this system. Specif-
ically, if the controller loses the lock, then the FPGA
could be programmed to sense this condition, sweep for
a signal, hone in, and re-acquire the lock. The abstract
logical nature of VHDL code makes this task simple rel-
ative to the procedure needed to create an acquisition
system using standard electronics.
V. SUMMARY
To demonstrate the use of programmable logic tech-
nology in an otherwise familiar setting, we have concen-
trated on a linear control application. We have used this
example to convey the issues associated with a digital
controller, including design, latency, and discretization.
However, we have only hinted at the more interesting
advanced applications in experimental quantum optics
which are sure to develop more quickly because of this
technology. FPGAs and similar devices are particularly
suited to any physical system where non-linear mappings
are desired between output and input variables within the
natural dynamical time-scale. With these devices and
sufficiently protected quantum systems in hand, the field
of coherent quantum control may soon have enough speed
to match the intelligence of its proposed controllers.
ACKNOWLEDGEMENTS
J.S. acknowledges the support of a Hertz Foundation
Fellowship, and H.M. acknowledges the support of an A.
P. Sloan Research Fellowship. This work was supported
by the NSF under grant PHY-9987541, and by the ONR
under Young Investigator Award N00014-00-1-0479.
[1] C.A. Sackett et al., Nature 404, 256 (2000).
[2] M. R. Andrews et al., Science 273, 84 (1996).
[3] C. J. Hood et al., Science 287, 1447 (2000).
[4] S. Habib, K. Jacobs, and H. Mabuchi, Feedback control of
atomic motion in an optical cavity, Unpublished.
[5] A. C. Doherty et al., Phys. Rev. A 62, 012105 (2000);
A. C. Doherty et al., Phys. Rev. A 63, 062306 (2001).
[6] H. M. Wiseman and R. B. Killip, Phys. Rev. A 57, 2169
(1998);
D. W. Berry and H. M. Wiseman, Phys. Rev. A 63, 013813
(2000).
[7] M. A. Nielsen and I. L. Chuang, Quantum Computation
and Quantum Information (Cambridge University Press,
2000).
[8] H. Rabitz et al., Science 288, 824 (2000).
[9] D. Stranneby, Digital Signal Processing (Newnes, 2001).
