Arbitrary digital pulse sequence generator with delay-loop timing by Hošák, Radim & Ježek, Miroslav
Arbitrary digital pulse sequence generator with delay-loop timing
Radim Hosˇa´k1 and Miroslav Jezˇek1, a)
Department of Optics, Faculty of Science, Palacky´ University, 17. listopadu 12, 77146 Olomouc,
Czech Republic
(Dated: January 9, 2018)
We propose an idea of an electronic multi-channel arbitrary digital sequence generator with
temporal granularity equal to a single clock cycle. We implement the generator with 32
channels using a low-cost ARM microcontroller and demonstrate its capability to produce
temporal delays ranging from tens of nanoseconds to hundreds of seconds, with 12 ns timing
granularity and linear scaling of delay with respect to the number of delay loop iterations.
The generator is optionally synchronized with an external clock source to provide 100 ps
jitter and overall sequence repeatability within the whole temporal range. The generator is
fully programmable and able to produce digital sequences of high complexity. The concept of
the generator can be implemented using different microcontrollers and applied for controlling
of various optical, atomic, and nuclear physics measurement setups.
I. INTRODUCTION
An arbitrary digital pulse sequence generator is a
crucial tool for many instrumentation, automatization,
and metrological tasks. It is frequently used to control
and synchronize other devices to form highly complex
setups like magnetic resonance imaging1, trapped ions
experiments2,3, and fast reconfigurable photonic circuits4
for quantum simulations and quantum communication
networks5,6, to name a few applications. A large num-
ber of electronic and optoelectronic building blocks such
as switches, digital attenuators, direct digital synthesiz-
ers, digital-to-analog converters, electro-optic or acousto-
optic modulators, and gated solid-state detectors, which
are employed in these setups, require many digital con-
trol channels with precise timing. Commercial solutions
typically offer limited flexibility and impractically large
resources per channel. Arbitrary waveform generators
(AWGs) and digital delay generators (DDGs) represent
two examples of the frequently used commercial devices.
AWGs allow high degree of sequence control, which goes
far beyond a digital pattern, but are severely limited in
sequence length. They are very inefficient in producing
long sparse sequences. DDGs offer picosecond timing and
long delays but are unable to generate complex sequences
with many changes of the output signal.
The lack of flexible digital generators with reasonable
timing properties and a large enough number of output
channels leads to the development of custom-built so-
lutions based on prototyping platforms, such as micro-
controllers (MCUs)7–9 and field programmable gate ar-
rays (FPGAs)10–12. The custom solutions often also of-
fer analog and radio frequency signals9,11. MCUs have
gained significant computing power, they can be eas-
ily programmed, and their large number of peripherals
makes them highly flexible, the downside of which is that
the reaching of full temporal control down to single clock
cycle represents a challenging task. FPGAs offer a sin-
gle cycle or even sub-cycle resolution employing a seri-
alization approach10. However, the development on this
a)Electronic mail: jezek@optics.upol.cz
platform is more complicated compared to the MCUs,
particularly the asynchronous design represents a chal-
lenging task.
The design of a custom digital pulse sequence genera-
tor, further called pulsebox, typically utilizes one of three
basic approaches: (1) timer peripherals, (2) recalling the
sequence from a fast memory, and (3) communicating
the sequence during a runtime. Using timers to clock de-
lays between the sequence events7,9 offers good tempo-
ral resolution but the corresponding interrupt overhead
impacts negatively the minimum time interval between
the events. The fast memory approach allows for single-
clock sequence programming, however, it causes a strict
resolution vs. sequence length trade-off as digital states
of all channels for every clock cycle have to be saved
in the memory. This approach is particularly suitable
for short sequences with high temporal resolution12,13.
Alternatively, the digital states of the pulsebox can be
programmed during its runtime using external commu-
nication, which imposes no limitation on the sequence
complexity10. However, ongoing communication over-
head limits the minimum time interval between the se-
quence events and makes such the solution particularly
suitable for long and sparse sequences.
The presented work focuses on a custom pulsebox de-
vice which produces digital signals on many output chan-
nels. The output signals take on the form of arbitrary
pulse sequences, such as the one in Fig. 1. The sequences
may contain both rapid successions of short pulses and
pulses with great temporal separation. It is desirable
that both these features be available within the single
sequence generated by the pulsebox. We propose here a
concept of this flexible pulsebox, its implementation us-
ing an ARM microcontroller, and a set of characteristics
along with methods for their measurement.
II. DESIGN
A pulse sequence can be understood as a number of
digital output changes occurring at specified times. The
pulsebox can reproduce any pulse sequence, as long as for
each output change two pieces of information are known:
the desired digital state of all output channels after the
ar
X
iv
:1
80
1.
02
43
3v
1 
 [p
hy
sic
s.i
ns
-d
et]
  8
 Ja
n 2
01
8
2output change, and the time at which the output change
is to occur. Our design relies on a 32bit microcontroller
unit (MCU) whose digital pins are divided into ports
of 32 and individually tied to the pulsebox digital out-
puts, either directly or via auxiliary circuitry such as level
shifters or buffer amplifiers. The states of the 32 indi-
vidual digital pins of each port are governed by a 32-bit
register. Digital pins belonging to the same port are used
so that the state of all the pulsebox output channels can
be changed synchronously by manipulating a single reg-
ister. The 16 randomly selected channels were character-
ized during the calibration stage to demonstrate correct
operation. One digital pin of the MCU is used as an
input for an external trigger signal.
To correctly reproduce the pulse sequence, the out-
put changes need to occur at the specified times. With-
out artificially introduced delay, two consecutive output
changes will be separated by a time interval correspond-
ing to the time it takes the MCU to modify the port
register. If the output changes need to be more sepa-
rated in time, it is necessary to produce an extra delay by
instructing the MCU to stay idle using the NOP assem-
bly instruction14. This performs a no-operation which
takes one clock cycle. By repeating this instruction in
the source code, the distance between two output changes
can be changed by the unit of one clock cycle. This is the
best granularity possible for an MCU-based pulsebox.
Using the NOP instruction yields very good granular-
ity, but is not desirable for long delays, as the MCU
program memory would fill up very quickly. Instead,
the NOP instruction is looped over inside a delay loop.
The loop can be repeated almost infinitely, making a
very large temporal distance between consecutive output
changes possible. However, the loop consists of multiple
instructions. This means that the granularity is worse
than when using NOP instructions exclusively.
The delay loop consists of two distinct parts—the setup
and the looped section. The setup specifies the number
of loop iterations and its processing by the MCU takes a
constant amount of time. The looped section is repeated
a given amount of times and is responsible for most of
the delay achieved via the delay loop. The time spent in
the delay loop setup gives a lower bound on the length of
the shortest pulse achievable using this method, and the
length of the looped section determines the granularity
with which we can vary the delay lengths. The loop has
thus been designed to minimize the number of instruc-
tions in both the setup and the looped section, leading
us to implement it in assembly language.
To access the entire range of achievable temporal dis-
tances between consecutive output changes, and still
benefit from the one-cycle granularity, both the NOP
approach and the delay loop approach must be com-
bined. The only drawback here comes from the way the
MCU predicts which instructions will be performed next.
When conditional branches occur, which is the case for
the delay loops, the program execution might stall up to
three clock cycles before the MCU discovers which in-
structions follow. This could lead to an up to three-clock
deviation in the output change timing15. Fortunately,
this is a systematic deviation and not a jitter, as the
timing stays constant throughout sequence repetitions.
0 100 200 300 400 500
µs
1
2
3
Figure 1. An example of a complex three-channel pulse se-
quence. The sequence pattern consists of both rapid suc-
cessions of pulses and output changes which occur sparsely
in time. The highlighted regions near the beginning, mid-
point, and ending of the sequence are shown in greater detail.
The shortest structure takes 120 ns, the whole sequence has
a duration of 0.5 ms. The depicted waveforms are the actual
output of a prototype of the presented pulsebox device.
This effect can be eliminated by a detailed calibration
that would provide feedback for the sequence input.
The final part of the design is the software package
for user-friendly reconfiguration of the device. The soft-
ware gathers user-provided data about the shape of the
sequence, generates a source code file with instructions
for the pulsebox to reproduce the sequence, compiles it,
and uploads the resulting binary file to the MCU. The
software is able to extract the sequence shape data from
different user-provided information, such as the positions
and lengths of individual pulses. The source code is cre-
ated by gluing together instructions for changes of states
of the MCU digital pins with delay loops that provide
the MCU with idle time between output changes.
Our physical realization of the concept mentioned
above relies on the Arduino Due development board
based on the Atmel SAM3X8E microcontroller16. Ar-
duino Due has been previously used for custom-built con-
trol solutions in several advanced applications including
optical tweezers, preparation and measurement of non-
classical states of light, and quantum memory17–19. Im-
plementing the pulsebox on this low-cost development
board demonstrates the feasibility of the delay-loop con-
cept and, also, further extends the applicability of similar
MCUs for advanced analog as well as digital control.
III. PULSEBOX PERFORMANCE
To characterize the performance and capabilities of the
device, we have chosen these figures of merit: (1) the min-
imum and (2) maximum time interval between output
changes, (3) timing granularity, (4) linearity, (5) run-to-
3101 103 105 107 109
loop iterations increment
10−6
10−4
10−2
100
102
de
la
y
du
ra
ti
on
in
cr
em
en
t
[s
]
Figure 2. The delay duration increment is shown as a function
of the increment in a number of delay-loop iterations. The
calibration data are fitted to a linear function. See text for
more details.
101 103 105 107 109
loop iterations
10−11
10−10
10−9
10−8
10−7
10−6
10−5
10−4
un
ce
rt
ai
nt
y
[s
]
Figure 3. The delay duration span (cross marks, solid lines)
and the absolute value of the mismatch between the data and
the calibration line fit (circle marks, dotted lines) are shown
for each calibration run. An increase in both the delay span
and the fit mismatch, caused by the drift and/or instability
of the internal clock, is seen after 105 iterations.
run uncertainty, (6) inter-channel simultaneity, (7) trig-
ger latency, and (8) maximum sequence complexity.
To measure the minimum time interval between out-
put changes, an output change, and then immediately
another one, were made on a single channel, resulting
in a single pulse. The minimum pulse length was then
the time interval between the 50% crossings of the rising
and falling edge of the pulse, which we measured using
an oscilloscope. For our device, the result was 24 ns,
which corresponds to two clock cycles of the MCU. Us-
ing the delay loop approach, the minimum pulse length
was achieved using a single iteration loop and was mea-
sured to be 144 ns, or 12 clock cycles.
The device achieves the ultimate granularity using the
NOP approach. By incrementally adding NOP instruc-
tions between two output changes, we observed prolong-
ing of the pulse in steps of 12 ns, or one clock cycle.
Additionally, it is of interest to know the granularity of
the delay loop approach. Instead of inferring this infor-
mation in the same manner as in the case of the repeated
NOP instructions, we perform a complete calibration of
the dependence of delay duration on the number of delay
loops.
The delay-loop calibration relies on measurements of
the delay between two consecutive output changes for
different numbers of delay loops, starting from one iter-
ation and going up to the maximum number. The delay
was measured using a time-to-digital converter (TDC,
UQDevices) sensitive to rising edges only, so a two-pulse
sequence was used for the measurements, the delay loop
being placed between the pulses. To get the delay be-
tween the falling edge of the first pulse and the rising edge
of the second one, and more specifically, the prolonging
of the said delay with various increases of the number
of delay-loop iterations, we took the TDC result for one-
iteration delay loop as a reference, and subtracted it from
the results for the other numbers of iterations. The final
result yields the calibration curve showing linear depen-
dence for delay loops, which tells us by what amount is
the delay between two output changes prolonged if the
number of iterations of the corresponding delay loop is
increased from one to a certain amount, see Fig. 2.
The whole calibration procedure was repeated 9 times
to verify the stability and the linearity of the delay-loop
approach. For each loop-iteration increment we evalu-
ated a min-max span of the corresponding delay duration
across all the measurements, which characterizes the run-
to-run uncertainty of the given delay, see Fig. 3. In our
case, the uncertainty is of the order of 100 ps for delays
up to 100 ms and increases rapidly for larger delays.
To verify the perfect linear dependence, the acquired
data were fitted to a linear function y = a×x. From the
slope of the linear fit, we estimated the delay-loop timing
granularity to be 60 ns, or 5 clock cycles. A mismatch
between individual calibrations and the linear fit shows
similar features as the run-to-run uncertainty, see Fig. 3.
We will show later that the increase of the uncertainty as
well as of the mismatch is caused by internal clock drift
and can be entirely removed using external clock source.
The maximum time interval between output changes
was obtained from the calibration data for the delay loop
with the highest possible number of iterations, with the
resulting value of 255.7 s.
The inter-channel simultaneity of output changes has
also been studied. When all pulsebox channels were set
to perform the change from a low voltage level to a high
voltage level, the time difference between the rising edges
at 50% signal level was found to be 2 ns at maximum
between the slowest and the fastest channel. This value
represents an upper bound on the simultaneity including
all the inter-channel delays originated in the pulsebox,
cables, and connectors.
The trigger latency has been measured by running a
single-pulse sequence in the triggered mode of operation.
A signal from an external generator is fed to the pulsebox.
A copy of the signal is used to trigger an oscilloscope.
The trigger latency is the time interval between the 50%
crossings of the trigger signal and the first pulse of the
4pulsebox output. With the device in its current form,
the trigger latency is around 900 ns, with roughly 60 ns
peak-to-peak jitter. These values are more than suffi-
cient taking into account that the pulsebox is typically
triggered by external events with periods much larger
than 1 µs, for example to synchronize the pulsebox with
the power line frequency or with data acquisition. Once
triggered, the output changes at all channels within a
single sequence are generated with negligible sub-ns run-
to-run uncertainty and used to control all the circuitry
and devices in the setup.
The maximum sequence complexity is characterized in
terms of the maximum number of output changes and
delay loops that can occur in a sequence. With rising
complexity, the MCU program memory gradually fills up,
until its capacity is exceeded. The maximum complex-
ity was estimated using a model pulse sequence which
uses a delay loop between each two consecutive output
changes. For our current implementation the most com-
plex sequence of this kind that we were able to generate
consisted of 10670 output changes and 10669 delay loops.
IV. OPERATION WITH EXTERNAL CLOCK SOURCE
By default, the Arduino Due development board runs
with 84 MHz master clock frequency, which is obtained
from a 12 MHz crystal oscillator using clock dividers
and multipliers. Replacing this crystal with an external
clock makes it possible to synchronize the operation of
the pulsebox with other devices used in the experimental
setup and eliminate the drift of the internal clock. In-
stead of the common 10 MHz external clock we used a
12 MHz one to facilitate the comparison of the pulsebox
timing properties with the internal clock.
The calibration procedure has been repeated 8 times
for the pulsebox with the external clock source, and in-
deed it shows a sub-nanosecond run-to-run uncertainty
over the entire range of delays, see Fig. 4. The fit mis-
match was also reduced below one nanosecond for all
the acquired measurements and all possible delays of the
pulsebox.
V. POSSIBLE EXTENSIONS
The functionality of the device can be further extended
and some of the characteristics can be improved. As for
the number of digital output channels, it is possible to
go beyond 32. However, this means that more digital pin
ports must be used and that multiple registers will some-
times have to be modified in order to perform an output
change. This automatically leads to worse inter-channel
output change simultaneity, as multiple corresponding
registers cannot be changed using a single instruction.
This can, however, be compensated in post-processing
using a digital delay. The upper bound on the distance
between two consecutive output changes can be overcome
by either performing multiple delay loops or by nesting
a delay loop inside another loop. The nested loop ap-
proach would yield a substantial increase of the range of
101 103 105 107 109
loop iterations
10−12
10−11
10−10
10−9
10−8
un
ce
rt
ai
nt
y
[s
]
Figure 4. Calibration results for pulsebox operating with an
external clock source. The delay duration span (cross marks,
solid lines) and fit mismatch (circle marks, dotted lines) are
presented, demonstrating perfectly stable operation with sub-
nanosecond jitter.
possible delays.
Using an external clock source opens up the possibility
of overclocked device operation. Higher clock rate allows
for shorter pulses and lower granularity. The datasheet of
the used MCU specifies the external oscillator frequency
to be in the range of 3–20 MHz. This frequency is, by de-
fault, effectively multiplied by a factor of 7. Overclocked,
and even underclocked, operation has been achieved in
our tests. However, we have observed that uploading
code to the device fails for some frequencies of the ex-
ternal clock source. This can be avoided by changing
the clock frequency to the 12 MHz default for the upload
and manipulating the clock frequency only afterwards.
Furthermore, the operation stopped if the master clock
frequency was too high. The master clock frequency can
be further manipulated by changing the quotients of the
built-in frequency dividers and multipliers, which will be
subject to further tests. Alternatively, a faster-clock sup-
porting MCU can be used to reach even better granular-
ity.
The presented pulsebox generates unipolar signals.
Bipolar operation can be attained using a two-way com-
biner to merge two unipolar channels into a single bipo-
lar one, which was demonstrated for pulse generators by
Strachan et al. and Haylock et al.12,13.
Finally, some of the channels could be used to feed
programmable attenuators/amplifiers, direct digital syn-
thesizer, or digital-to-analog converters. This would en-
able radio-frequency or arbitrary analog outputs. The
performance of these outputs, such as the minimum dis-
tance between two signal changes, would depend on the
parameters of the employed extension device. We tested
interfacing the pulsebox to programmable digital attenu-
ator operating from 9 kHz to 6 GHz (Mini Circuits ZX76-
31R75PP+). Using direct parallel programming, we can
command the attenuator to set a new attenuation level
by sending 7bit digital word within one clock cycle of our
pulsebox20.
5VI. CONCLUSION
We have presented a general-purpose digital pulsebox
capable of producing complex sequences of pulses both
in rapid succession and with great temporal separation,
which sets it apart from devices built for one specific
application. We have implemented the pulsebox using a
low-cost ARM microcontroller and demonstrated its wide
functionality, considerable precision, and sub-nanosecond
timing jitter. The pulsebox timing granularity is 12 ns,
or one clock cycle, achieved by using delay loops to-
gether with single-cycle no-operation instructions. The
presented 24 ns (2 clock cycles) minimum interval be-
tween output changes is a significant improvement over
many-cycle intervals presented by many previous works.
The employed delay-loop approach allows for extensive
delays up to 255 s between the output changes while pre-
serves the high temporal resolution and prevents exces-
sive memory usage. The information about the shape of
the sequence is communicated to the pulsebox by storing
a specific program code in the memory. This approach,
combined with the program memory capacity available
on the MCU employed, yields the maximum sequence
complexity of more than 20,000 output changes and de-
lays.
Apart from designing the pulsebox itself, a set of
characteristics and the methods of their measurement,
applicable to a broad spectrum of similar devices, has
been devised. We believe that these basic character-
istics will facilitate characterization and comparison
of future multi-channel digital pulse sequence generators.
ACKNOWLEDGMENTS
This work was supported by the Czech Science Foun-
dation (project 17-26143S). RH also acknowledges the
support by the Palacky University (project IGA-PrF-
2017-008). We would like to thank M. Dudka, I. Straka,
P. Obsˇil, and L. Slodicˇka for many fruitful discussions on
the pulsebox concept and applications.
REFERENCES
1R. W. Brown, Y.-C. N. Cheng, E. M. Haacke, M. R. Thomp-
son, and R. Venkatesan, Magnetic Resonance Imaging: Physical
Principles and Sequence Design (John Wiley and Sons, 2014).
2H. Ha¨ffner, C. F.Roos, and R. Blatt, Phys. Rep. 469, 155 (2008).
3S. Debnath, N. M. Linke, C. Figgatt, K. A. Landsman,
K. Wright, and C. Monroe, Nature 536, 63 (2016).
4D. Bonneau, M. Lobino, P. Jiang, C. M. Natarajan, M. G. Tan-
ner, R. H. Hadfield, S. N. Dorenbos, V. Zwiller, M. G. Thompson,
and J. L. O’Brien, Phys. Rev. Lett. 108, 053601 (2012).
5M. Gra¨fe, R. Heilmann, M. Lebugle, D. Guzman-Silva, A. Perez-
Leija, and A. Szameit, J. Opt. 18, 103002 (2016).
6O. Alibart, V. D’Auria, M. De Micheli, F. Doutre, F. Kaiser,
L. Labonte´, T. Lunghi, E´. Picholle, and S. Tanzilli, J. Opt. 18,
104001 (2016).
7S. Handa, T. Domalain, and K. Kose, Rev. Sci. Instrum. 78,
084705 (2007).
8P. E. Gaskell, J. J. Thorn, S. Alba, and D. A. Steck, Rev. Sci.
Instrum. 80, 115103 (2009).
9E. E. Eyler, Rev. Sci. Instrum. 82, 013105 (2011).
10L. Sun, J. J. Savory, and K. Warncke, Concepts Magn. Reson.
43, 100 (2013).
11T. Pruttivarasin and H. Katori, Rev. Sci. Instrum. 86, 115106
(2015).
12B. Haylock, F. Lenzini, S. Kasture, P. Fisher, E. W. Streed, and
M. Lobino, Rev. Sci. Instrum. 87, 054709 (2016).
13J. P. Strachan, V. Chembrolu, X. W. Yu, T. Tyliszczak, and
Y. Acremann, Rev. Sci. Inst. 78, 054703 (2007).
14L. D. Pyeatt, Modern Assembly Language Programming with the
ARM Processor (Elsevier, 2016).
15ARM Limited, DDI0337H cortex m3 r2p1 trm (2010), revision
r2p1.
16Atmel Corporation, Atmel-11057C-ATARM-SAM3X-SAM3A-
Datasheet 23-Mar-15 (2015).
17D. Nino, H. Wang, and J. N. Milstein, Eur. J. Phys. 35, 055009
(2014).
18K. Cox, G. Greve, J. Weiner, and J. Thompson, Phys. Rev. Lett.
116, 093602 (2016).
19D. Saunders, J. Munns, T. Champion, C. Qui, K. Kaczmarek,
E. Poem, P. Ledingham, I. Walmsley, and J. Nunn, Phys. Rev.
Lett. 116, 090501 (2016).
20I. Straka et al., in preparation (2017).
