FPGA Stream-Monitoring of Real-time Properties by Baumeister, Jan et al.
ar
X
iv
:2
00
3.
12
47
7v
1 
 [c
s.D
C]
  1
8 M
ar 
20
20
FPGA Stream-Monitoring of Real-time Properties
Jan Baumeister
Saarland University
Department of Computer Science
Saarbru¨cken, Saarland, Germany
jbaumeister@react.uni-saarland.de
Bernd Finkbeiner
Saarland University
Department of Computer Science
Saarbru¨cken, Saarland, Germany
finkbeiner@react.uni-saarland.de
Maximilian Schwenger
Saarland University
Department of Computer Science
Saarbru¨cken, Saarland, Germany
schwenger@react.uni-saarland.de
Hazem Torfah
Saarland University
Department of Computer Science
Saarbru¨cken, Saarland, Germany
torfah@react.uni-saarland.de
ABSTRACT
An essential part of cyber-physical systems is the online evaluation
of real-time data streams. Especially in systems that are intrinsi-
cally safety-critical, a dedicated monitoring component inspecting
data streams to detect problems at runtime greatly increases the
confidence in a safe execution. Such a monitor needs to be based
on a specification language capable of expressing complex, high-
level properties using only the accessible low-level signals. More-
over, tight constraints on computational resources exacerbate the
requirements on the monitor. us, several existing approaches
to monitoring are not applicable due to their dependence on an
operating system.
We present an FPGA-based monitoring approach by compiling
an RTLola specification into synthesizable VHDL code. RTLola is
a stream-based specification language capable of expressing com-
plex real-time properties while providing an upper bound on the
execution time and memory requirements. e statically deter-
mined memory bound allows for a compilation to an FPGA with a
fixed size. An advantage of FPGAs is a simple integration process
in existing systems and superb executing time. e compilation
results in a highly parallel implementation thanks to the modular
nature of RTLola specifications. is further increases the maxi-
mal event rate the monitor can handle.
KEYWORDS
Real-time Properties, Runtime Verification, FPGA
ACM Reference format:
Jan Baumeister, Bernd Finkbeiner, Maximilian Schwenger, and Hazem Tor-
fah. 2016. FPGA Stream-Monitoring of Real-time Properties. In Proceedings
of International Conference on Embedded Soware, New York City, October
13 – 18, 2019 (EMSOFT19), 13 pages.
DOI: 10.1145/1122445.1122456
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for profit or commercial advantage and that copies bear this notice and the full cita-
tion on the first page. Copyrights for components of this work owned by others than
ACMmust be honored. Abstracting with credit is permied. To copy otherwise, or re-
publish, to post on servers or to redistribute to lists, requires prior specific permission
and/or a fee. Request permissions from permissions@acm.org.
EMSOFT19, New York City
© 2016 ACM. . . .$15.00
DOI: 10.1145/1122445.1122456
1 INTRODUCTION
With the growing autonomy of cyber-physical systems, the evalu-
ation, aggregation, and monitoring of real-time data have become
essential for ensuring the safety of the system. A principled ap-
proach to building suchmonitors is provided by stream-based spec-
ification languages like RTLola [17, 18]. Input streams that col-
lect data from sensors, networks, etc., are filtered and combined
into output streams that contain data aggregated from multiple
sources and over multiple points in time such as over sliding win-
dows of some real-time length. Trigger conditions over these out-
put streams then identify critical situations.
Previous work has been very successful in using stream-based
specifications for analyzing recorded data streams, such as the flight
data of drones [1, 17] and network traces [16]. However, tools that
have been developed for the offline analysis of recorded data can-
not directly be used for online monitoring, such as for an onboard
monitoring component on a drone. e reason is the substantial
soware overhead of such offline tools. Cyber-physical systems
operate under narrow constraints on the available resources. A
monitor must, specifically, process all data in real time and within
the available memory.
In this paper, we present a compilation approach that realizes
RTLola specifications on field-programmable gate arrays (FPGAs).
FPGAs have dramatic advantages over soware-based solutions in
terms of processing speed due to the inherent parallelism, and also
in terms of other factors such as energy consumption, weight, and
ease of integration within the cyber-physical system.
In RTLola, input streams are event-driven, i.e., without a pri-
ori known frequencies; output streams are typically periodic. is
difference is reflected in the realization of the monitor as a two-
module architecture consisting of a high-level controller and a low-
level controller. e role of the high-level controller is to receive
the events, prepare stream evaluations and to schedule periodic
tasks. e low-level controller then computes new stream values
based on the information received from the high-level controller
and triggers an alarm when appropriate.
A key challenge for the compilation is the treatment of sliding
window expressions. In general, there is no bound on the memory
needed to the store the potentially unbounded number of events
received during the time period of the window. Ourmonitoring cir-
cuit splits the full window into smaller chunks, where the data can
EMSOFT19, October 13 – 18, 2019, New York City Baumeister et al.
be pre-aggregated without loss of precision. As a result, the num-
ber of registers needed for the monitor can (under some mild as-
sumptions on the aggregation functions) be determined statically.
e immediate compilation to a hardware description language
allows us to achieve a high level of parallelism. For this, we analyze
the specification to identify modular sub-structures and evaluate
them in parallel. We showcase the impact of this analysis with a
synthetic case study. Furthermore, we demonstrate the practical-
ity of the compilation by presenting experimental data from two
realistic case studies from avionics and network monitoring. Both
case studies indicate that the compilation utilizes the benefits of
hardware: the implementation is highly efficient, requires only a
small board, and consumes less than 2W of power.
e main contribution of this paper is an automatic compilation
of an RTLola specification into an FPGA monitor. e resulting
circuits have a clear structure following the formal RTLola seman-
tics. e monitor is decoupled from the observed system. Unlike
instrumentation-based approaches [10, 13], themonitor is indepen-
dent of the origin of the data. Furthermore, there are no assump-
tions on the frequency of the inputs granted it is lower than the
maximum clock frequency of the FPGA.
e monitor utilizes the inherently parallel nature of hardware:
the high-level controller is organized into a pipeline architecture,
which ensures that new events can enter the controller before the
processing of the previous events has been completed. In the low-
level controller, however, the evaluation order ensures that inde-
pendent streams are processed in parallel. Moreover, the monitor
is highly space and energy efficient. Unlike interpreter-based ap-
proaches [10, 12], which include a general-purpose runtime envi-
ronment, the compiled circuit is strictly limited to the operations
that actually occur in the specification. As a result, the monitors
of our case studies are able to run on small FPGA boards with lile
power (< 2W).
1.1 Related Work
Most of the earlier work on formal runtime monitoring was based
on temporal logics [15, 20, 23, 26, 27, 38]. e approaches vary be-
tween inline methods that realize a formal specification as asser-
tions added to the code to bemonitored [23], or outline approaches
that separate the implementation of the monitor from the one of
the system under investigation [20]. Based on these approaches
and with the rise of real-time temporal logics such as MTL [25]
and STL [31], a series of works introduced monitoring algorithms
for real-time properties [2, 3, 14, 36].
First translations from temporal logics to monitoring circuits
have been introduced with the tools FoCs [11], developed at IBM
Haifa, P2V [30], a compiler that translates assertions wrien in
sPSL [8] to Verilog code, BusMOP [37], which synthesized moni-
tor circuits from specificationswrien in past-time linear temporal
logic for monitoring PCI bus traffic, and MBAC [6], an automata-
based monitor synthesizer for PSL properties. Inspired by these
constructions, an optimized approach for bounded future proper-
ties was presented in [19]. Hardware runtime monitors for real-
time properties were presented by Jaksic et al. [24], where moni-
tors for STL specifications were implemented in an FPGA. Further
work on FPGA implementations of real-time temporal specifica-
tion was introduced with the tool R2U2 [34, 35], an outline moni-
toring approach that allows for monitoring specifications in MTL
including future-time specifications.
Temporal logics come with the advantage of providing formal
guarantees on the space and time complexity of the synthesized
monitors. However, a major drawback of these logics is their ex-
pressiveness. Whenmonitoring cyber-physical systems, one needs
to express properties beyond yes and no verdicts (for example with
some degree of arithmetic operation) to be able to monitor realis-
tic properties of the system. Stream-based languages over complex
datatypes like RTLola [17, 18] provide such expressiveness and
further maintain a desirable level of formal guarantees.
e stream-based approach to monitoring was pioneered by the
specification language Lola [12]. Lola is related to synchronous
programming languages like Lustre [7, 22], and Esterel [5], which
have been widely used for the development of digital circuits [4].
In contrast to these languages, Lola is a descriptive language, which
subsumes the temporal logics and can express both past and future
properties. A feature of Lola is that upper bounds on the mem-
ory required for monitoring can be computed statically. RTLola
extends Lola with asynchronous streams and real-time features
such as sliding windows. Two other extensions of Lola are TeSSLa
and Striver. TeSSLa [10] allows for monitoring piece-wise constant
signals where streams can emit events at different speeds with ar-
bitrary latencies. It relies on the instrumentation of C code and
is thus not independent of the monitored system. Moreover, RT-
Lola comes with the feature of computing aggregations over slid-
ing windows, and allows for the decoupling of the computation
of output streams from variable input event rates via fixed-rate
clocks. e main difference between RTLola and Striver [21] is
that RTLola has both variable-rate and fixed-rate streams and pro-
vides convenient, native operators such as sample-and-hold and
sliding windows that translate between the two types of streams.
e fixed rate in RTLola allows for a more direct translation to a
hardware implementation of the monitor.
An approach for compiling synchronous Lola has been pre-
sented in [32]. We remove the assumption of synchronously ar-
riving data and add real-time capabilities to the specification lan-
guage.
2 RTLOLA
RTLola [18] is a stream-based specification language with real-
time features based on the specification language Lola [12]. In
stream-based runtime monitoring, sensor readings are interpreted
as streams of input data. is streams are fed into a stream en-
gine that computes new sequences of data called output streams
based on the values of input streams. e output streams compute
statistics over the sensor data and allow for stating verdicts about
the monitored system. e computation rules for output streams
are defined in RTLola by a stream equation, which is a defining
equation that maps a stream variable to a stream expression. Con-
sider for example a GPS module in a drone that delivers data about
the current longitude and latitude, and a monitor that checks if
the GPS module is delivering data in appropriate frequencies. An
FPGA Stream Monitoring EMSOFT19, October 13 – 18, 2019, New York City
RTLola specification for defining such a monitor is given by the
following stream definitions:
input gps: (Float64 , Float64)
output gps_glitch : Bool@1Hz :=
gps.aggregate (over :2s,using:count) < 10
trigger gps_glitch "GPS sensor frequency < 5Hz"
e stream gps is an input stream that represents the readings of
the GPS module and is expected to deliver data with a frequency
greater than or equal to 5Hz. To check whether this data is de-
livered with the expected frequency, we define the output stream
gps_glitch that computes a sliding window with a duration of two
seconds over the stream gps. e stream gps_glitch is computed
in a frequency of 1Hz and checks whether ten values are received
from the GPS module in the last two seconds. e window over
the input stream gps is computed via the expression gps.aggregate
(over:2s,using:count), which counts the number of data values of
gps in the last two seconds. If the number of values is less than 10,
then gps_glitch evaluates to true. In this case, an alarm is raised
with the message "GPS sensor frequency < 5Hz". is alarm is de-
fined by the trigger expression trigger gps_glitch.
e stream above is a periodic stream and as such computed
at a fixed frequency. In addition to that, RTLola also allows for
the definition of event-based streams by omiing the frequency.
Event-based streams are evaluated whenever streams occurring
in its stream expression are evaluated. For example, if we want
to check whether a vehicle is slowing down, we can compute the
change in velocity between the last two velocity sensor readings:
input velo: Float64
output slowing_down : Bool :=
velo - velo .offset(by:-1).defaults(to:0) < 0
e stream slowing_down is computed every time velo receives a
new value. To compute the difference, the stream expression uses
the offset operator to access the last (.offset(by:-1)) and current
value of the stream velo and then compute the difference between
these two values. In case the value of an offset operation is not
defined, the default operator (.defaults(to:d)) returns the value
d . In the example above, velo.offset(by:-1) is not defined before
receiving the first velocity reading, so the default value 0 is used
instead.
In the case where an output stream is defined over more than
one stream, the output stream is evaluated only if all streams it
depends on are evaluated as well. If one of these values is miss-
ing, one can still enforce the computation of the stream using the
sample-and-hold operator (.hold()). is operator accesses the last
value computed for a stream. If it is not present, the provided de-
fault operator (.defaults(to:d)) is used. e following specifica-
tion clarifies the role of this operator.
input gps: (Float64 , Float64)
input height: Float64
output too_low: Bool := if zone(gps)
then (height.hold ().defaults(to :300)) < 300
else false
trigger too_low "Flying low in inhabited area "
e function zone determines whether the drone is in an inhabited
area. When the vehicle is in this area, the specification checks
Event-based Periodic
Inputs
Output
0-order hold
sliding window,
0-order hold
3
2
1
12
11
10
9
8
7
6
5
4
EMSOFT 2019
Figure 1: Stream accesses of event-based and periodic
streams in RTLola
whether its current height (height) is less than 300 feet. If this is
the case, an alarm is raised because it violates the flight regulations
for inhabited areas.
RTLola imposes some rules on how streams may access the val-
ues of other streams. Figure 1 shows the general picture of RTLola
specifications. e values of an output stream may be used in the
definitions of other output streams as long as the following rules
are respected:
Access via sliding window: Periodic streams may access values of
other streams via a sliding window without any further restriction.
Access via offset operator: When accessing a stream with the off-
set operator, an RTLola specification must respect the following
rules:
1. Accessing periodic streams in event-based streams: ese accesses
are only allowed with the sample-and-hold operator.
2. Accessing event-based streams in event-based streams: ese ac-
cesses are always valid. However, the accessing stream is only ex-
tended if all accessed streams are extended at the same time. e
sample and hold operation eliminates this dependency.
3. Accessing event-based streams in periodic streams: Periodic streams
only access event-based streams with the sample-and-hold opera-
tor.
4. Accessing periodic streams in periodic streams: A periodic stream
s may access the values of another periodic stream s ′ if and only if
the frequency of s ′ is an integer-multiple of the one of s . Otherwise
the access is only allowed via the sample-and-hold operator.
5. Recursive stream access: Any stream is allowed to access its own
history of values as long as it does not create any circular access
like accessing itself with an offset of 0. Consider the following
specification:
output num_glitches : UInt32 :=
num_glitches .offset(by:-1).defaults(to :0) +
(if gps_glitch then 1 else 0)
e output stream is an event-based stream that is evaluated every
time a new value is computed for gps_glitch. Note that there is no
need for the sample-and-hold operator as the output stream only
depend on gps_glitch. If the new value of gps_glitch is true, then
the new value of num_glitches is computed by increasing its last
EMSOFT19, October 13 – 18, 2019, New York City Baumeister et al.
value (num_glitches.offset(by:-1).defaults(to:0)) by one. Oth-
erwise, if gps_glitch is false, the new value of num_glitches is equal
to its last one.
For the full syntax and type system of RTLola we refer the
reader to the technical report1 [18].
In the rest of the paper we use the variables n↑, n↓ and n∗ to
indicate the number of output streams, number of input streams
and number of triggers in an RTLola specification, respectively.
2.1 Monitoring RTLola Specifications
Monitoring an RTLola specification consists of receiving events,
evaluating stream expressions, and triggering alerts when neces-
sary. e separation of event-based and periodic streamsmanifests
itself in the monitoring algorithm in that it consists of an event-
based and a periodic process.
e event-based process receives an event and extends streams
according to the evaluation order ≺, i.e., if the stream expression
of stream s contains a lookup with target s ′, then s ′ ≺ s . us, s ′
needs to be extended before s . e event-based process respects
this by successively evaluating streams as soon as the evaluation
order permits it.
e periodic process schedules streams according to their fre-
quency. Since all frequencies are determined a priori, we can com-
pute an array of deadlines, where deadline Di is a delay di and a
set of streams Si such that when Di−1 was due, aer di seconds,
Si need to be evaluated. e least common multiple of the peri-
ods of all periodic streams is the hyper-period (Π) and #dl denotes
the number of deadlines within one hyper-period. Like the event-
based process, the periodic process also respects the evaluation or-
der.
An RTLola specification can be monitored in one of two modes.
Offline mode describes a monitoring process that happens aer the
fact based on log data. It is useful for post-mortem analyses or
for validating a specification based on previous system runs. On-
line mode, however, is the concurrent execution of a system and
its monitor. FPGA-based monitoring is especially interesting for
the online mode because this mode requires timely processing of
events and imposes tighter restrictions on the monitor in terms of
available resources.
e major difference between the two modes in the evaluation
process is the source of the current timestamp. In online mode, the
value is the system time of the monitor. In offline mode, however,
events are annotated with time stamps. e monitor considers the
received time stamp to be the current time and checks whether a
deadline would have been missed. If so, it first computes all peri-
odic streams affected by the deadline. Aerwards, it processes the
event as described before.
2.2 Sliding Windows
e evaluation of sliding windows needs special aention. As-
sume the stream expression of s with frequency xHz contains a
sliding window expression such as s′.aggregate(over:δ,using:γ )
for some duration δ and aggregation function γ . A naive imple-
mentation requires to store all values of s ′ within the last δs, which
1e technical report also describes parametrization with dynamic stream creation,
which we do not consider here.
Event Time velo p1 p2 p3 avg velo
0.0 s ε ε ε
1 0.5 s 10.0 ε ε (10.0,1)
2 0.6 s 10.1 ε ε (20.1,2)
1.0 s ε ε (20.1,2) 8.0
2.0 s ε (20.1,2) ε 8.0
3 2.2 s 9.9 (20.1,2) ε (9.9,1)
3.0 s (20.1,2) ε (9.9,1) 10.0
Figure 2: Detailed computation of a sliding average.
is unfeasible because there is no information about the arrival fre-
quency of s ′. If γ : A∗ → B is a list homomorphism as defined
by Maarten [33], the sliding window can be evaluated accurately
with only a finite amount of memory. List homomorphisms can be
split into four components: a unary map: A → T and finalization
fin: T → B, an associative binary reduction ⊕ : T ×T → T , and a
neutral element ε w.r.t. ⊕. Assuming γ is a list homomorphism, we
utilize the fact that sliding windows only occur in periodic streams.
All new values occurring within a xs time interval are effectively
equivalent w.r.t. their arrival time. We now apply the bucketing
approach proposed by Li et al. [28] and split the duration of the
window into δx−1 equal-sized buckets. Each bucket stores an in-
termediate value, initialized with ε , and pre-aggregates all values
within two evaluations of the window expression using ⊕. At the
time of the evaluation, the intermediate values get reduced to ob-
tain the final value.
Fortunately, many commonly used aggregation functions are
list homomorphisms, such as summation, minimization, maximiza-
tion, counting, integration, and averaging.
As an example, consider the following specification:
input velo : Float32
output avg_velo @1Hz :=
velo.aggregate (over :3s,using:avg)
.defaults (to:8.0)
Since the average is a list homomorphism, we define the following
concrete components:
• map: R→ R ×N with map(v) ≔ (v, 1)
• fin: R ×N→ R with fin(v, c) ≔ vc
• ⊕ : (R×N)2 → R×Nwith (v1, c1)⊕ (v2, c2) ≔ (v1+v2, c1+
c2)
• ε ≔ (0, 0)
Figure 2 details the computation of the average with three buck-
ets. We list the values for all buckets at points in time when either
an event arrives or avg_velo gets computed. Here, p1 represents
the “oldest” bucket, and p3 the most recent one.
Initially, all buckets contain the element ε . Upon receiving the
first velocity at time stamp 0.5 s, the value of the last bucket is
changed to (0, 0) ⊕ map(10.0) = (10.0, 1). When the next event is
received at time stamp 0.6 s, we add the value to the same bucket
and get (10.0, 1) ⊕ map(10.1) = (20.1, 1). At time stamp 1.0 s, we
compute avg_velo for the first time. Since the current time stamp
is less than the length of the window, the default values is used.
Aerwards, we evict the oldest bucket, shi all bucket values to
FPGA Stream Monitoring EMSOFT19, October 13 – 18, 2019, New York City
the le, and add a new one with value ε . e same happens at time
stamp 2.0 s. e next event arrives at time stamp 2.2 s and is added
tob3. At time 3 s, we stop using the default value and aggregate the
buckets. e resulting value is finalized, i.e., fin((20.1, 2) ⊕ (0, 0) ⊕
(9.9, 1)) = 303 = 10
3 COMPILATION
e hardware realization of an RTLola specification consists of
two modules connected via a first-in-first-out queue as can be seen
in Figure 3. eHigh-level Controller (HLC) receives external events
consisting of event data for each affected input stream and a time
stamp in offline mode, as well as the system time in online mode.
e HLC acts as mediator between event-based inputs and peri-
odic deadlines, such that later components in the architecture do
not need to distinguish them anymore. e number of bits theHLC
receives is sts +
∑n↓
i=1(si + 1) where sts and si denote the number
of bits required to represent a single timestamp and value of input
stream i , respectively. e additional bit per input stream indicates
whether the current event contains a new value for the respective
stream. e HLC decides whether a periodic deadline is due or an
event ought to be evaluated. is decision is based on information
about events and the internal system clock. e respective infor-
mation is preprocessed with respect to the specification and stored
in theeue. It consists of sev = (
∑n↓
i=1(si + 1)) + sts +n
↑ bits with
the following semantics:
(1)
∑n↓
i=1(si + 1) bits encode an event as explained before. If
the signal encodes a deadline, all bits are 0 indicating that
no data is available.
(2) sts bits contain the time stamp used for the evaluation of
sliding windows and as implicitly defined input stream
with name time.
(3) n↑ bits declare for each output stream whether or not they
are affected by the current deadline or event.
e Low-level Controller (LLC) uses this information to manage the
evaluation process: all input streams, and output streamswhich ex-
pression can be evaluated immediately are extended first, followed
by the remaining output streams in further steps according to the
evaluation order. e LLC also manages updates and the evalua-
tion of sliding windows occurring in output stream expressions.
Due to the lower complexity of HLC’s task, it is capable of re-
ceiving events faster than the LLC can process them. For this rea-
son, the queue acts as a buffer between the two components. While
this does not prevent a loss of data when the pressure on the eval-
uator exceeds its limits for an extended amount of time, it tem-
porarily relieves the stress of a sudden burst of events. Moreover,
it cleanly decouples the two components, enabling them to work
independently and concurrently at their own pace.
3.1 Notation
We first introduce some notation. e ◦ operator denotes bit-con-
catenation. 0n denotes an n-fold concatenation of 0-bits. Let x
be a bit string of length n. x[i] denotes the ith bit of x assuming
i < n. x[ℓ . . .u] is the substring x[ℓ] ◦ x[ℓ + 1] ◦ · · · ◦ x[u − 1] for
ℓ < u < n. e bounds can be omied, i.e., x[. . .u] = x[0 . . .u]
and x[ℓ . . . ] = x[ℓ . . .n]. Further, let ξ be the internal system clock
Event-based
Periodic
Inputs
High-level
Controller
3
2
1
12
11
10
9
8
7
6
5
4
EMSOFT 2019
Q_ueue
Low-level
Controller
Outputs
event ∈ Bsts+
∑
(si+1)
B ∋ push qin ∈ B
sev
B ∋ pop
empty ∈ B
qout ∈ B
sev
trig ∈ Bn
∗
Figure 3: Schematic of an RTLola monitor composed of
two modules connected via a queue. e High-level Con-
troller manages the order in which periodic and event-based
streams have to be evaluated. e Low-level Controller man-
ages the evaluation process of all affected streams.
rate and sums over all input streams are abbreviated by omiing
the limits, i.e.,
∑
si =
∑n↓
1≤i si .
We distinguish between signals and registers. e former are
data lines between components, which we will write in a slanted
font, such as signal. e laer are mere flip-flop components that
are updatedwith a rising clock edge, wrien in bold face: register.
3.2 High-level Controller
is module receives external events and schedules periodic tasks.
It pre-processes data with respect to the specification and stores
the information in the queue.
Figure 4 shows the schematic of the module. Doed lines rep-
resent signals and components that are only present in the offline
mode. e HLC has access to the common system clock sclk, and
two registers avail and din which are wrien by an external en-
tity and contain data of new events. e components are organized
in a pipeline architecture, which ensures that new events can en-
ter the controller before the processing of the previous events has
been completed. e green, top-le-striped part handles the event-
based inputs, whereas the blue, top-right striped part handles peri-
odic deadlines. e HLQInterface then unifies events and dead-
lines.
PreScaler. is component scales the system clock sclk down
by a constant factor to theHLC-internal hclk clock. hclk drives the
Scheduler, EventDelay, and the ExtInterface. e PreScaler
also provides an internal clock for theHLQInterface, which ticks
twice as fast as hclk and slower than sclk. For a cleaner illustration,
Figure 4 does not include the respective data lines, as well as valid
EMSOFT19, October 13 – 18, 2019, New York City Baumeister et al.
Event-based
Periodic
PreScalerExtInterface
TimeSelect
Scheduler
EventDelay
HLQInterface
B
sts ∋ ext ts ev ∈ B
∑
(si+1)
B
sts ∋ its
B
sts+#dl ∋ dl hold ∈ B
tev ∈ Bsts+
∑
(si+1)
B ∋ push data ∈ Bsev
Figure 4: Schematic of the High-level Controller receiving
external events, managing periodic deadlines, and prepar-
ing data for the Low-level Controller.
bits accompanying every data line with width greater than 1 indi-
cating the presence of meaningful data on the wire.
ExtInterface. is component handles the communicationwith
external input sources. e external source writes a 1-bit latch
avail when new input data is available in the din register. In on-
line mode, the ExtInterface reads din, and forwards it to the
EventDelay. In offline mode, the input event also contains a time
stamp, which the ExtInterface extracts and forwards to theTime-
Select component. In both modes, it then clears avail, indicating
that the next event can be received.
Formally, ExtInterface waits on hclk and behaves as follows,
where ev carries the event data, ext ts is the external time stamp
received with the event, and valid ext ts indicates whether there is
new and valid data on the ext ts wire.
ev0 = 0
∑
si
evt+1 =
{
dint [sts . . . ] if avail
t
0
∑
si otherwise
avail0 = 0
availt+1 =
{
1 if externalt ∧ ¬availt
0 otherwise
ext ts0 = 0sts
ext tst+1 =
{
dint [. . . sts] if avail
t
0sts otherwise
valid ext ts0 = valid ev0 = 0
valid ext tst+1 = valid evt+1 = availt
Here, external is an oracle indicating a change depending on an
external event.
TimeSelect. ecomponent waits on the system clock and com-
putes the internal time stamp its. In offline mode, this is simply
the time stamp formerly extracted from the input event. us, this
component boils down to a simple wire and does not introduce
any delay in the signal. In online mode, however, this component
computes the time that has passed so far by repeatedly adding the
period ξ of the system clock. is component uses an internal reg-
ister reg its mirroring the value of its. It persists the value of the
signal without introducing a delay2.
reg its0 = 0sts
reg itst+1 = reg itst + ξ = (t + 1) ∗ ξ
itst = reg itst
valid itst = 1
Scheduler. is component inspects the current internal time-
stamp its and detects when a periodic stream is due. It first deter-
mines the start time and stores it in the period register: in online
mode that is simply 0sts , whereas in offline mode this is the first
time stamp received from the external source. It then maintains
the invariant that period contains the least time stamp in the cur-
rent hyper-period. If, for example, the specification contains two
periodic streams with frequency 2Hz and 5Hz, then the hyper-
period is 1 s. If the first received event carries the timestamp 3.4 s,
period remains 3.4 s until a time stamp greater than or equal to
3.3 s + Π = 4.4 s is received. In this case, it jumps to 4.4 s. As a
result, the difference between its and period represents the time
within the current hyper-period.
e register did contains the id of the current deadline, i.e., the
deadline that needs to be evaluated next, in unary encoding. e
encoding is a trade-off: a binary encoding requires fewer registers
and wires but also two decoders, one in the Scheduler and one in
theHLQInterface. e did register is initialized with 0#dl , which
is an invalid unary number and indicates that the Scheduler has
not been initialized, i.e., it did not receive a start time, yet. e
initialization takes place in the first cycle in online mode, or in the
first cycle with enabled valid ext ts bit in offline mode.
Lastly, the prog(ress) signal indicates whether a new deadline is
due. It checks whether the Schedulerwas initialized and whether
the position in the current hyper-period exceeds the current dead-
line. For this check, it accesses the statically determined array of
deadline offsets as described in Section 2.1. e lookup consists of
conjoining each element of the array with the respective bit of the
did and bitwise disjoining all results: dl(did) =
∨#dl
i=1 dli ∧did[i]
In the following definitions, a subscript off (on) indicates the
offline (online) version of the register or signal. Usages without
2is can be achieved by leing the input wire of the register carry the same signal
as the output wire.
FPGA Stream Monitoring EMSOFT19, October 13 – 18, 2019, New York City
subscript use the respective version.
initoff 0 = 0
initt+1
off
= valid itst+1 ∧ (didt = 0#dl)
initton =
{
1 if t = 1
0 otherwise
did0 = 0#dl
didt+1 =

10#dl−1 if initt+1
csr(didt ) if ¬initt+1 ∧ progt+1
didt otherwise
period0 = 0#dl
periodt+1
off
=

itst+1 if initt+1
periodt
off
+ Π if didt = 0#dl+11 ∧ progt+1
periodt
off
otherwise
periodt+1on =

0 if initt+1
periodton + Π if did
t
= 0#dl+11 ∧ progt+1
periodton otherwise
progt+1 = didt , 0#dl ∧ (itst+1 − periodt ) > dl(didt )
Here, csr is a 1-bit cyclic shi to the right. e output signals
are thus defined as:
holdton = 0
dlt = itst ◦ didt
holdt
off
= progt
valid dlt = ¬progt
EventDelay. is component composes the internal time stamp
and the current event. e time stamp is later used in the evalua-
tion process. In online mode, the compound signal is then passed
to the HLQInterface without delaying the signal.
In offline mode, however, the EventDelay needs to take the
hold signal into account. To compensate for the delay introduced
by the Scheduler, the compound signal is delayed by one cycle.
Aerwards, the data is delayed further until hold turns off. During
the hold period, new events can be received and need to be stalled.
We discuss this issue below.
Formally, the component waits on hclk and uses two internal
registers, data which introduces the mandatory one-cycle delay
and reg tev mirroring the signal tev.
data0 = 01+sts+
∑
(si+1)
datat+1 =
{
datat if holdt+1
valid evt+1 ◦ itst+1 ◦ evt+1 otherwise
stalled0 = 01+sts+
∑
(si+1)
stalledt+1 =
{
stalledt if holdt+1
datat otherwise
tevt = stalledt [1 . . . ]
valid tevt+1 = ¬holdt+1 ∧ tevt [0]
Note that ev and its are always valid at the same point in time, so
we can verify the invariant
∀t : valid evt ⇐⇒ valid itst
QInterface. is component accepts data from the EventDelay
and the Scheduler and forwards information to the queue. It can
only push one data packet per cycle to the queue. Both in offline
and online mode, however, it can receive a deadline and an event
at the same time. For this reason, this component is clocked twice
as fast as hclk. is enables it to wait on events in even cycles
and wait on deadlines in odd cycles. Yet, it needs to be slower
than sclk such that the queue can still process both data packets
in time. As a result, it grants precedence to events. is is desired
to compensate for the delay introduced by the EventDelay and
preserve the correct order of events and deadlines.
Formally, in even cycles this component computes:
pusht = valid evt
datat = evt ◦
n↓∨
i=1
(dep(i) ∧ evt [
i∑
j=1
(sj + 1) − 1])
Here, dep is another static array of n↑ bit wide registers where
each bit represents a dependency between streams. I.e., if dep(i)[j]
is on, output stream j transitively depends on input stream i and
thus has to be evaluated with the current event. e respective
dependencies are conjoined with ev[
∑i
j=1(sj + 1) − 1], i.e., the bit
indicating whether the current event carries a new value for input
stream i . Overall, the data sent to the queue thus contains the event
data, the time stamp of the event, and one bit per stream indicating
whether the stream will be evaluated.
In odd cycles, the data signal only contains the streams affected
by the deadline:
pusht = valid dlt
datat = 0
∑
(si+1) ◦ dlt [. . . sts] ◦ dl target(dl
t [sts . . . ])
3.3 Input Buffering
e stallingmechanism in the EventDelay and Scheduler is only
necessary in offline mode. Two consecutive events ei and ei+1
can have time stamps that skip several deadlines. In this case, the
Scheduler repeatedly considers ei+1 as a new value and triggers
the computation of a deadline until no more deadline is due. Dur-
ing this time, it raises the hold flag, so that the EventDelay stalls
ei+1 before sending it to the HLQInterface. While stalling, the
ExtInterface can continue receiving events that are either lost,
or override ei+1. To prevent this, we add an input buffer of size L
in front of the Scheduler and EventDelay. e required buffer
size can be computed based on the input data. Assume that the
HLC receives a new input value every δ hclk cycles. e backlog
bl(ei ) describes how many cycles it takes to fully process all en-
tries currently in the buffer when receiving event ei , including all
deadlines induced by ei .
bl(e1) = 0
bl(ei+1) = bl(ei ) −min{bl(ei ),δ − 1} + dld(ei+1)
Here, dld(ei ) is the number of periodic deadlines that become due
when receiving ei . Intuitively, between event ei and ei+1, δ − 1
EMSOFT19, October 13 – 18, 2019, New York City Baumeister et al.
cycles pass without a new event, so we either process δ − 1 dead-
lines or events, or all entries in the buffer. Upon receiving ei+1,
we need to process an additional dld(ei ) deadlines plus the new
event. At the same time, another cycle passes, so we can immedi-
ately process one event or deadline. is effectively eliminates the
incoming event, so only dld(ei ) needs to be taken into account.
Let B be a buffer of size L with the following semantics, where
B
η
i is the ith entry of B at cycle η:
B0 = {⊥}L
Bη+1 =

Bη << 1 if ¬holdη+1 ∧ ¬valid itsη+1
Bη if holdη+1 ∧ ¬valid itsη+1
Bη ⊕ itsη+1 if holdη+1 ∧ valid itsη+1
(Bη << 1) ⊕ itsη+1 if ¬holdη+1 ∧ valid itsη+1
Here, B << 1 shis the entire buffer content to the le, i.e., the first
and thus oldest entry gets evicted, the n + 1st entry becomes the
nth, and the last entry becomes⊥. B ⊕ ν denotes that the first free
entry of B, i.e., the first k with Bk = ⊥, is replaced by ν . If no such
entry exists, the buffer overflows. Formally, the theorem states the
following:
Theorem 3.1. If the buffer size L maximizes bl, the buffer will
never overflow:
L ≥ max{bl} =⇒ ∀η : ¬valid itsη ∨ ¬holdη ∨ B
η
L
= ⊥
Proof. We define an abstract buffer B˜ where each abstract en-
try corresponds to a concrete one in B. Its value states how many
clock cycles are required to process the deadlines induces by the
respective concrete entry if it were the first one.
B˜0 = {⊥}L
B˜η+1 =

dec(B˜η) if B˜
η
1
> 0 ∧ ¬valid itsη+1
dec(B˜η) ⊕ dld(itsn+1) if B˜
η
1 > 0 ∧ valid its
η+1
B˜η << 1 if B˜
η
1
= 0 ∧ ¬valid itsη+1
(B˜ << 1) ⊕ dld(itsη+1) if B˜
η
1 = 0 ∧ valid its
η+1
Here, dec(B˜) reduces the value of the first and thus oldest value
by one, which represents that a deadline induced by the event was
processed. We define the size of an entry in B˜ as
size(B˜
η
i ) =
{
0 if B˜
η
i = ⊥
B˜
η
i + 1 otherwise
e proof follows from three facts.
1) bl is the sum of the size of B˜’s entries, i.e., for any event ei that
reaches the buffer in cycle ηi , the following holds:
bl(ei ) =
L∑
j=1
size(B˜ηi ) (1)
Proof by induction on the event sequence consisting of the events
e1, e2, . . . . Assume ηi is the clock cycle in which ei arrives at the
buffer. For η0:
L∑
j=1
size(B˜
η0
0
) =
L∑
j=1
0 = 0 = bl(e0)
In the induction step, we go from ηi to ηi+1. Note that these two
points in time are separated by δ clock cycles, i.e., ηi+1 = ηi + δ .
In each of these steps, no new value arrives at the buffer, so ∀j ∈
{1, . . . ,δ − 1} : ¬valid itsηi+j . us, by definition of B˜, the sum
of the abstract entries always decreases by 1 for each hclk cycle
unless the buffer is already empty. In this case, the values does not
change. In cycle ηi+1, however, the buffer additionally receives a
new value, so the sum of the entries also increases by dld(ei+1)+ 1.
Formally:
L∑
j=0
size(B˜ηi+1)
=
L∑
j=1
size(B˜ηi+1−1) + (dld(ei+1) + 1) − 1
=
L∑
j=1
size(B˜ηi ) −min{
L∑
j=1
size(B˜
ηi
j ), δ − 1} + dld(ei+1)
= bl(ei ) −min{bl(ei ),δ − 1} + dld(ei+1) (IH)
= bl(ei+1)
e next fact can be proven using Equation (1):
2) e abstract buffer cannot overflow, more concretely:
L ≥ max{bl} =⇒ ∀η : ¬valid itsη ∨ B˜
η−1
1 = 0 ∨ B˜
η−1
L
= ⊥ (2)
Assume L ≥ max{bl} and valid itsη ∧ B˜
η−1
1
> 0 ∧ B˜
η−1
L
, ⊥.
Since valid itsη , we know that a new event arrived. If it is the first
event, i.e., η = η0, the contradiction follows from the definition of
B˜. Otherwise, let η = ηi+1. We inspect the last δ steps. We know
that no new value arrived, and because B˜
η−1
L
, ⊥ holds, there was
no shi.
B˜
ηi+1−1−δ
1 = B˜
ηi−1
1 = δ + B˜
ηi+1−1
1 ≥ δ + 1 (3)
As a result:
bl(ei ) =
L∑
j=1
size(B˜ηi ) (Eq. (1))
≥
L∑
j=2
size(B˜η1 ) + δ + 1 (Eq. (3))
≥ δ + 1 + L − 1 (δ¿0)
≥ L + 1
is contradicts L ≥ max{bl}.
3) Each entry of the abstract buffer corresponds to an entry in the
concrete buffer.
∀η, i : B˜
η
i = ⊥ ⇐⇒ B
η
i = ⊥ (4)
e equation holds by the definitions of B˜ and B. e proof itself
consists of correct bookkeeping of the buffer states and respective
signal values.
By Equation (4) we know that each empty entry in the abstract
buffer is also empty in the concrete buffer. Moreover, Equation (2)
verifies that the abstract buffer never overflows. us, the concrete
buffer cannot overflow as well, concluding the proof.

FPGA Stream Monitoring EMSOFT19, October 13 – 18, 2019, New York City
LLQInterface EvalController
d in ∈ Bsevempty ∈ BB ∋ pop
een ∈ B
Figure 5: Schematic of the Low-level Controller receiving
event and deadline information from the queue and eval-
uating streams accordingly.
idle
start
pop
eval
¬empty
⊤
¬
em
p
ty
em
pty
(a) e LLQInterface han-
dles the the communication
with the queue in pop, and
waits in eval until the evalu-
ation finished.
idle
start
1
2.1. . .2.ℓ
een
d
on
e
1
done2.1done2.(ℓ-1)
d
on
e
2.ℓ
(b) e EvalController
manages the evaluation.
State 1 treats input streams,
2.1 through 2.λ output
streams according to the
evaluation order.
Figure 6: State machines for the LLQInterface and the
EvalController
ini outj wη
u
pd
∈
B
d i
n
∈
B
s i
do
n
e
∈
B
d o
u
t
∈
B
s o
u
t i
pe
∈
B
ev
al
∈
B
w
in
∈
B
w
de
p(
j)
de
p i
n
∈
B
de
p(
j)
do
n
e
∈
B
d o
u
t
∈
B
s o
u
t j
ev
ic
t
∈
B
u
pd
∈
B
re
q
∈
B
d i
n
∈
B
s i
n η
do
n
e
∈
B
d o
u
t
∈
B
s o
u
t η
Figure 7: Input and output signals of input and output
streams.
3.4 Low-level Controller (LLC)
e LLC receives elements from the queue and evaluates streams
according to the information received. Aer the evaluation, it checks
for violated properties and triggers an alarm if appropriate.
As can be seen in Figure 5, it consists of a LLQInterface com-
ponent which communicates with the queue and triggers an eval-
uation process taking place in the EvalController.
LLQInterface. is component consists of a three-state machine
depicted in Figure 6a. In the idle state, it waits on new inputs
from the queue. On a falling edge of empty, it transitions into the
pop state, rising the pop signal for one sclk cycle. At the end of
this cycle, it unconditionally transitions to eval, seing the evalu-
ation enable een latch. is signals the EvalController that valid
data is on the din wire, so an evaluation can be started. Aer the
evaluation is completed, EvalController clears the een signal.
Depending on the current queue state, it transitions back to idle
or pop.
EvalController. is component is a state machine as depicted in
Figure 6b with ℓ + 2 states where ℓ = max(ℓ ∈ N|∃s1 . . . sℓ : s1 ≺
· · · ≺ sℓ) is the number of layers of the evaluation order (see
Figure 6b). In addition to the state machine, there are n↓/n↑/nw
input/output/window components. In the following, components
and signals indexed with i, j,η refer to inputs, outputs, and win-
dows, respectively.
In the idle state, the EvalController waits on a rising edge
of een, on which it transitions to state 1. is state corresponds to
a so-called pseudo-extension phase, where all output streams that
get a new value in this evaluation cycle are extended by a pseudo
value #. is value will never be used in a computation but allows
for resolving offsets correctly without shiing the offsets depend-
ing on the evaluation status of the target stream. Input streams
are immediately extended by their new values, and windows evict
outdated buckets. us:
∀i ≤ n↓ : updi = din[
∑
n≤i
(sn + 1)]
∀j ≤ n↑ : updj = din[
∑
(si + 1) + sts + j]
∀η ≤ nw : evictη = 1
e structure of input, output and window components is depicted
in Fiдure 7. In the input stream components we get the following
behavior for a rising edge in updi where κ(i) describes the greatest
offset of any lookup with target i :
donet = updt
R0n = 0
si+1
Rt+1n =

Rtn+1 if upd
t+1 ∧ n , κ(si )
Rtn if ¬upd
t+1
din
t+1 ◦ 1 if updt+1 ∧ n = κ(si )
dout
0
= 0κ(i )·(si+1)
dout
t+1
= Rt1 ◦ · · · ◦ R
t
κ(si )
By storing κ(i) values for any stream i , all offsets can be resolved
when evaluating stream expressions.
Output streams on a rising edge of pe behave as follows:
donet = pet
R0n = 0
κ(j)·(sj+1)
Rt+1n =
{
# if n = κ(j)
Rtn+1 otherwise
dout
0
= 0κ(j)·(sj+1)
dout
t+1
= Rt1 ◦ · · · ◦ R
t
κ(sj )
For windows, the number of buckets is β , i.e., the length of the
window durη multiplied with the extend frequency fη of stream
in which the window occurs. On a rising edge of evict, din carries
the current time stamp in the first sts bits. e window requires
this information to decide whether new buckets are outdated. If so,
the values of all registers are shied and the now-empty bucket is
EMSOFT19, October 13 – 18, 2019, New York City Baumeister et al.
initialized with ε . e internal T register stores the time when the
next bucket becomes outdated.
T0 = 0sts
Tt+1 =
{
Tt if din[. . . sts] ≤ T
t
Tt + fη otherwise
done0 = 0
donet+1 = din[. . . sts] ≤ T
t
R0n = ε
Rt+1n =

ε if n = β ∧ din[. . . sts] > T
t
Rtn+1 if n , β ∧ din[. . . sts] > T
t
Rtn if din[. . . sts] ≤ T
t
Signal done1 indicates that phase 1 of the evaluation is complete:
done1 =
∧
i≤n↓
updi =⇒ donei ∧
∧
j≤n↑
pej =⇒ donej ∧
∧
η≤n∗
doneη
Note that the implication ensures that a done signal is only relevant,
if the respective component was enabled.
Aer done1 is raised, the EvalController transitions to phase
2 via state 2.1. In the 2.x states, streams are successively ex-
tended according to the evaluation order and windows are updated
whenever the target stream computed a new value. Wires connect
streams and windows w.r.t. their dependencies, i.e., all streams out-
put a sequence of values coupled with a bit indicating its validity.
Invalid values are then replaced with the default values specified
in the stream expression. Window lookups require an additional
computation step, initiated by the req signal.
Formally, when transitioning to state 2.x with 1 ≤ x ≤ ℓ, the
EvalController raises the update signals for outputs and win-
dows if appropriate,i.e., if the stream is in the respective evaluation
layer and the HLC indicated that the stream is affected.
evalj = j ∈ layer(x) ∧ din[
∑
(si + 1) + sts + j]
updη = douttar(η)[star(η)]
On a rising edge of evalj , the output stream computes its new value
and updates its internal state:
donet = evalt
Rt+1n =
{
evalexpr(j) ◦ 1 if evalt+1 ∧ n = κ(j)
Rtn otherwise
dout
t+1
= Rt1 ◦ · · · ◦ R
t
κ(sj )
Here, evalexpr(j) is the result of evaluating the stream expression
of stream j. e computation can be split into several computation
steps depending of the size of the expression to increase the max-
imum system clock frequency. In this case, the done bit cannot be
set immediately aer receiving the eval command. Note that only
# values are overwrien in this step and the valid bit is set. In slid-
ing windows, a new values is added by applying the map function
and reducing it onto the last bucket.
Rt+1
β
= Rt
β
⊕ map(din
t+1)
donet = updt
It requires an additional step to compute the new value of the slid-
ing window. is process is initiated by the EvalController by
raising the reqη flag aer the window’s target stream was com-
puted. All bucket values get reduced using the aggregation’s reduc-
tion function ⊕, and finalized aerwards. Since ⊕ is associative and
the number of buckets is a compile time constant, the reduction is
structured as a binary tree with logarithmic depth in the number
of buckets. is triggers the following behavior in the window:
dout
t+1
= fin(Rt1 ⊕ · · · ⊕ R
t
β
)
donet2.x = rq
t
4 CASE STUDY
We validated the compilation with three case studies. e first
two monitor a network and an avionic and describe realistic sce-
narios, whereas the third one consists of synthetic data and em-
phasizes the benefits of the parallel evaluation structure presented
in Section 3. All specifications were compiled into VHDL code
and then synthesized on a Zynq-Z-7010 ARM/FPGA SoC Trainer
Board3, which is logic-equivalent to an Artix-7 FPGA. e Zynq-
7000 features 4.400 logic slices, each with four 6-bit input LUTs and
8 flip flops.
Note that the specifications in the benchmarks are simplified
for illustration purposes. e current prototype does not support
a floating or fixed point unit. e limitation is a result of technical
incompatibilities in the Xilinx synthesizing soware; from a theo-
retical standpoint, the inclusion of a floating-point unit is possible.
is results however in a larger circuit realization of the specifica-
tions.
4.1 Avionics
Figure 8 shows a specification for a drone. Input events consist
of longitude and latitude values, the velocity and the number of
GPS satellites in range. e GPS module is supposed to send val-
ues for the longitude and latitude with frequency 10Hz. Output
stream gps_freq counts the number of samples received within a
second and checks if it falls below 9. In this case, the first trig-
ger reports the unexpectedly low sample frequency. e second
trigger reports a warning when the drone’s velocity drops below
700, requiring that the velocity was greater than 700 before that.
For the third trigger, we use a simplified reconstruction of the dis-
tance the drone traveled using the Pythagorean theorem. A more
realistic approximation can be obtained e.g. by using the haver-
sine function. e square root computation is realized using the
constant-time function proposed by Li and Chu [29]. e distance
is then discretely differentiated to compute the velocity according
to the GPS module. is allows for cross-validating sensor values
by comparing the sensed input velocity with the computed one. If
the two values deviate too strongly, an alarm is raised. Lastly, we
detect hover phases by integrating either velocity value and check-
ing whether it lays below a threshold value.
We compiled the specification to VHDL and synthesized a cir-
cuit on the Zynq-7000 board. We report the resource consumption
in terms of required flip-flops (FF), look-up tables (LUT), multiplex-
ers (MUX), adders (CA), and multipliers (MULT) for each compo-
nent below, where ”Mon” describes the entire synthesized monitor:
3hps://reference.digilentinc.com/reference/programmable-logic/zybo/
reference-manual? ga=2.102758273.1814454663.1555084001-1980681841.1546416239
FPGA Stream Monitoring EMSOFT19, October 13 – 18, 2019, New York City
Component FF LUT MUX CA MULT
Mon 3036 3685 26 656 18
HLC 901 156 0 22 0
Q 543 442 0 43 0
LLC 1281 2820 0 576 18
Note that the amount of resources like flip-flops of the entire
monitor is not equal to the sum of the resources of all components.
e difference is required for internal tasks such as signal manage-
ment. One can see that most flip-flops reside in the LLC because
it manages the persisted values of all streams. e HLC requires
around 70% as many, which can be contributed to the fact that
each component of the HLC contains internal registers while the
greatest offset in the specification is only −1, reducing the memory
requirement of the LLC. e overwhelming majority of look-up ta-
bles, adders, and multipliers reside in the LLC which was expected
given that this component implements the evaluation logic. e 18
multipliers are required for squaring the δ -values and computing
the integral window.
e power consumption amounted to 0.121W when idle and
1.620W when processing.
We tested the monitor in online mode with sensor data created
in a simulation using the ArduPilot4 Copter5 drone simulator. e
simulator consisted of a multicopter flying over the campus of a
university. Sensor information was piped to the monitor over a
serial port. Evaluating events and periodic deadlines took on av-
erage 428 system clock cycles with a period ξ = 100MHz. us,
each event tookon average 4.28µs to be processed. Here, theworst
slack amounted to 1.653 ns.
4.2 Network Monitoring
e network monitoring exerted an immense pressure on the mon-
itor due to the sheer amount of input data received in a short
amount of time. In this seing it is also reasonable to forgo any
assumption on the input frequency.
e specification in Figure 9 fixes the IP of one particular server
and checks network traffic based on the source and destination IP
of requests, TCP flags, and the length of the payload. First, the
length stream is filtered based on whether the server is the target
and the request pushes data. We sum up the filtered stream for a
second and trigger an alert if the amount of data spikes over 10MB.
Moreover, we count the number of opened and closed incoming
connections and issue an alert if the server aempts to close more
connections that were opened. Lastly, we check for a significant
amount of incoming connections in a short amount of time.
Due to the lower complexity of the specification, the resource
consumption is also generally lower compared to the avionics ex-
ample. e number of look-up tables decreases by around 60%,
adders by 65% and multipliers by 100%. e number of flip-flops
only decreases by around 38% since there is no significant differ-
ence in the number of sliding windows and lookup expressions
in the two specifications, but integral windows require 5-times as
much memory as summation and count windows.
4hp://ardupilot.org/
5hp://ardupilot.org/copter/index.html
Component FF LUT MUX CA MULT
Mon 1905 1533 23 226 0
HLC 550 161 0 37 0
Q 330 342 0 28 0
LLC 895 927 0 161 0
e power consumption amounted to 0.120W when idle and
1.570W when processing, so there is no significant difference be-
tween the two specifications.
We tested the implementation with data from the Mid-Atlantic
Collegiate Cyber Defense Competition (MACCDC)6 . We re-played
the log data in real time using the time stamps provided.
While the evaluation process is simpler, theHLC remains mostly
the same. us, the amount of system clock cycles required per
event only decreases by around 25%, the response time for a sin-
gle event is 3.2µs on average. e worst slack time, however, in-
creased by 150% to 4.0 ns. is allows for safely increasing the
system clock frequency by up to 200MHz. e reason for this is
that the square root computation in the avionics specification has
a significantly greater depth than all operation performed while
monitoring the network. Since the computation is taken out in a
single cycle, the slack time decreases significantly.
4.3 Parallelization
Section 3.4 presents a compilation that produces a highly parallel
evaluation process by identifying modular structures within the
specification. e modularity is maximized when a specification
contains a large number of independent streams. Practical exam-
ples of this kind of specification are command-response or geofenc-
ing specifications. Here, each reaction and each face of the fence
constitutes an independent stream, allowing for a parallel evalua-
tion.
More concretely, consider a system that receives different com-
mands froman external entity and needs to verify the system health
depending on the kind of command. Such a specification can be
found in Figure 10. e highly disjunctive nature allows for per-
fect parallelization: each output stream solely depends on input
streams. In this case study, the specification is realized twice, once
as proposed in Section 3, and once without the parallelization of
the evaluation. Purposefully declared spurious dependencies be-
tween successive output streams enforce a sequential evaluation.
Figure 10 contains an extract of the specification.
Neither the size of the realization, nor the power consumption
when idle varied between the realizations. A stress-test succes-
sively increases the input data rate until the LLC can no longer
process events in time. For this, the companion processor on the
Zynq sends events to the FPGA and measures the time it takes
for the FPGA to produce an output. is measurement produces
more robust result than the communication over a bus in the pre-
ceding case studies but can only be applied in the absence of peri-
odic streams. When processing events in the maximum frequency
for each realization, the parallel realization requires slightly more
power (1.582W) than the sequential one (1.581W). As opposed to
that, the execution time varies significantly. e sequential execu-
tion requires 43.83µs, whereas the speed of the parallel execution
6hps://www.netresec.com/?page=MACCDC
EMSOFT19, October 13 – 18, 2019, New York City Baumeister et al.
input lat , lon , velo: Int32
input gps: UInt8
output gps_freq@1Hz : bool :=
lat .aggregate (over:1s,using:count).defaults (to :10) < 9
trigger gps_freq "GPS frequency less than 9 Hz"
output fast := velo > 700
trigger fast.offset(by:-1).defaults (to:false) & !fast
"Slowing down"
output gps_dist := sqrt(δ (lon )ˆ2 + δ (lat)ˆ2)
output gps_velo := gps_dist / δ (time)
trigger abs (gps_velo - velo) > 10 "Sensor deviation "
output hovering@1Hz :=
velo. aggregate (over:5s,using:
∫
).defaults (to :5) < 1
trigger hovering "Little distance covered "
Figure 8: RTLola specification for monitoring a drone.
constant server: Int32 = ...
input src , dst: Int32
input fin , push , syn : bool
input length: Int32
output receiver := dst = server
trigger @1Hz
receiver .aggregate (over :0.5s,using:Σ) > 10000
"Many incoming connections "
output received := if receiver & push
then 0
else length
output workload@1Hz :=
received .aggregate (over:1s,using:Σ)
trigger workload > 10ˆ7 "Workload too high"
output opened :=
open.offset (by:-1).defaults (to :0) +
(if dest = server & syn then 1 else 0)
output closed :=
closed.offset(by:-1).defaults (to:0) +
(if dest = server & fin then 1 else 0)
trigger open - closed < 0
"Closed more connection than were open"
Figure 9: RTLola specification for monitoring network traf-
fic.
exceeds the computation speed of the processor, which is up to
866MHz, i.e. 3.77µs between sending an event and aempting to
read the output. As a result, the measured 3.77µs constitute an
upper bound on the actual response time. Practically, this means
that if the processor sends events to the FPGA with it maximum
frequency, the parallel realization can process all events, whereas
the sequential one loses 89% of the data.
5 CONCLUSION
Wehave presented a hardware-based monitoring approach for stream-
based real-time specifications by compiling RTLola specifications
input cmd: Int16
input height , x, y, ...: Int32
output health_crit_1 : Bool := height < 400
trigger health_crit_1 ∧ cmd = 1
...
output health_crit_512 : Bool :=
x > 700 ∨ y < 250 ∧ height > 300
trigger health_crit_512 ∧ cmd = 512
Figure 10: RTLola specification for a highly parallelizable
property.
to circuits on FPGAs. e resulting circuits are small and efficient.
Unlike interpreter-based approaches, the compiler limits the cir-
cuits to the operations in the specification and allows for a high
degree of parallelization. e presented case studies show that
FPGA-based stream-monitoring is feasible for non-trivial specifi-
cations. While we used a small board, the available resources were
only utilized by less than 50% and the power consumption was
around 1.5W under maximal pressure. is makes the approach
suitable for integration into embedded systems without draining
the available resources.
Building on the work presented in this paper, the next step is to
extend the FPGA approach to stream specifications with parame-
terization [16] and to investigate the applicability of FPGA-based
monitoring in distributed architectures.
ACKNOWLEDGMENTS
is work was partially supported by the German Research Foun-
dation (DFG) as part of the Collaborative Research Center Foun-
dations of Perspicuous Soware Systems (TRR 248, 389792660),
and by the European Research Council (ERC) Grant OSARES (No.
683300).
REFERENCES
[1] Florian-Michael Adolf, Peter Faymonville, Bernd Finkbeiner, Sebastian
Schirmer, and Christoph Torens. Stream runtime monitoring on UAS. In Shu-
vendu K. Lahiri and Giles Reger, editors, Runtime Verification - 17th International
Conference, RV 2017, Seale, WA, USA, September 13-16, 2017, Proceedings, vol-
ume 10548 of Lecture Notes in Computer Science, pages 33–49. Springer, 2017.
[2] DavidA. Basin, Felix Klaedtke, SamuelMu¨ller, and EugenZalinescu. Monitoring
metric first-order temporal properties. J. ACM, 62(2):15:1–15:45, 2015.
[3] David A. Basin, Srdjan Krstic, and Dmitriy Traytel. AERIAL: almost event-
rate independent algorithms for monitoring metric regular properties. In Giles
Reger and Klaus Havelund, editors, RV-CuBES 2017. An International Workshop
on Competitions, Usability, Benchmarks, Evaluation, and Standardisation for Run-
time Verification Tools, September 15, 2017, Seale, WA, USA, volume 3 of Kalpa
Publications in Computing, pages 29–36. EasyChair, 2017.
[4] Gerard Berry. Formally unifying modeling and design for embedded systems -
A personal view. In Tiziana Margaria and Bernhard Steffen, editors, Leveraging
Applications of Formal Methods, Verification and Validation: Discussion, Dissemi-
nation, Applications - 7th International Symposium, ISoLA 2016, Imperial, Corfu,
Greece, October 10-14, 2016, Proceedings, Part II, volume 9953 of Lecture Notes in
Computer Science, pages 134–149, 2016.
[5] Ge´rard Berry and Georges Gonthier. e esterel synchronous programming
language: Design, semantics, implementation. Sci. Comput. Program., 19(2):87–
152, 1992.
[6] Marc Boule and Zeljko Zilic. Automata-based assertion-checker synthesis of
PSL properties. ACM Trans. Design Autom. Electr. Syst., 13(1):4:1–4:21, 2008.
[7] Paul Caspi, Daniel Pilaud, Nicolas Halbwachs, and John Plaice. Lustre: A declar-
ative language for programming synchronous systems. In Conference Record of
FPGA Stream Monitoring EMSOFT19, October 13 – 18, 2019, New York City
the Fourteenth Annual ACM Symposium on Principles of Programming Languages,
Munich, Germany, January 21-23, 1987, pages 178–188. ACM Press, 1987.
[8] Ping Hang Cheung and Alessandro Forin. A c-language binding for PSL. In
Yann-Hang Lee, Heung-Nam Kim, Jong Kim, Yongwan Park, Laurence Tianruo
Yang, and Sung Won Kim, editors, Embedded Soware and Systems, [ird] In-
ternational Conference, ICESS 2007, Daegu, Korea, May 14-16, 2007, Proceedings,
volume 4523 of Lecture Notes in Computer Science, pages 584–591. Springer, 2007.
[9] Christian Colombo and Martin Leucker, editors. Runtime Verification - 18th In-
ternational Conference, RV 2018, Limassol, Cyprus, November 10-13, 2018, Proceed-
ings, volume 11237 of Lecture Notes in Computer Science. Springer, 2018.
[10] Lukas Convent, Sebastian Hungerecker, Torben Scheffel, Malte Schmitz, Daniel
oma, and Alexander Weiss. Hardware-based runtime verification with em-
bedded tracing units and stream processing. In Colombo and Leucker [9], pages
43–63.
[11] Anat Dahan, Daniel Geist, Leonid Gluhovsky, Dmitry Pidan, Gil Shapir, Yaron
Wolfsthal, Lyes Benalycherif, Romain Kamdem, and Younes Lahbib. Combining
system level modeling with assertion based verification. In 6th International
Symposium on ality of Electronic Design (ISQED 2005), 21-23 March 2005, San
Jose, CA, USA, pages 310–315. IEEE Computer Society, 2005.
[12] Ben D’Angelo, Sriram Sankaranarayanan, Ce´sar Sa´nchez, Will Robinson, Bernd
Finkbeiner, Henny B. Sipma, Sandeep Mehrotra, and Zohar Manna. LOLA: run-
time monitoring of synchronous systems. In 12th International Symposium on
Temporal Representation and Reasoning (TIME 2005), 23-25 June 2005, Burlington,
Vermont, USA, pages 166–174. IEEE Computer Society, 2005.
[13] Normann Decker, Philip Goschling, Christian Hochberger, Martin Leucker,
Torben Scheffel, Malte Schmitz, and Alexander Weiss. Rapidly adjustable
non-intrusive online monitoring for multi-core systems. In Simone Andre´
da Costa Cavalheiro and Jose´ Luiz Fiadeiro, editors, Formal Methods: Founda-
tions and Applications - 20th Brazilian Symposium, SBMF 2017, Recife, Brazil, No-
vember 29 - December 1, 2017, Proceedings, volume 10623 of Lecture Notes in
Computer Science, pages 179–196. Springer, 2017.
[14] Jyotirmoy V. Deshmukh, Alexandre Donze´, Shromona Ghosh, Xiaoqing Jin,
Garvit Juniwal, and Sanjit A. Seshia. Robust online monitoring of signal tempo-
ral logic. Formal Methods in System Design, 51(1):5–30, 2017.
[15] Doron Drusinsky. e temporal rover and the ATG rover. In Klaus Havelund,
John Penix, and Willem Visser, editors, SPIN Model Checking and Soware Ver-
ification, 7th International SPIN Workshop, Stanford, CA, USA, August 30 - Sep-
tember 1, 2000, Proceedings, volume 1885 of Lecture Notes in Computer Science,
pages 323–330. Springer, 2000.
[16] Peter Faymonville, Bernd Finkbeiner, Sebastian Schirmer, and Hazem Torfah. A
stream-based specification language for network monitoring. In Ylie`s Falcone
and Ce´sar Sa´nchez, editors, Runtime Verification - 16th International Conference,
RV 2016, Madrid, Spain, September 23-30, 2016, Proceedings, volume 10012 of Lec-
ture Notes in Computer Science, pages 152–168. Springer, 2016.
[17] Peter Faymonville, Bernd Finkbeiner, Malte Schledjewski, Maximilian
Schwenger, Marvin Stenger, Leander Tentrup, and Hazem Torfah. Streamlab:
Stream-based monitoring of cyber-physical systems. In Isil Dillig and Serdar
Tasiran, editors, Computer Aided Verification - 31st International Conference,
CAV 2019, New York City, NY, USA, July 15-18, 2019, Proceedings, Part I, volume
11561 of Lecture Notes in Computer Science, pages 421–431. Springer, 2019.
[18] Peter Faymonville, Bernd Finkbeiner, Maximilian Schwenger, and Hazem Tor-
fah. Real-time stream-based monitoring. CoRR, abs/1711.03829, 2017.
[19] Bernd Finkbeiner and Lars Kuhtz. Monitor circuits for LTL with bounded and
unbounded future. In Saddek Bensalem and Doron A. Peled, editors, Runtime
Verification, 9th International Workshop, RV 2009, Grenoble, France, June 26-28,
2009. Selected Papers, volume 5779 of Lecture Notes in Computer Science, pages
60–75. Springer, 2009.
[20] Bernd Finkbeiner and Henny Sipma. Checking finite traces using alternating
automata. Formal Methods in System Design, 24(2):101–127, 2004.
[21] Felipe Gorostiaga and Ce´sar Sa´nchez. Striver: Stream runtime verification for
real-time event-streams. In Colombo and Leucker [9], pages 282–298.
[22] Nicolas Halbwachs. A synchronous language at work: the story of lustre. In
3rd ACM & IEEE International Conference on Formal Methods and Models for Co-
Design (MEMOCODE 2005), 11-14 July 2005, Verona, Italy, Proceedings, pages 3–
11. IEEE Computer Society, 2005.
[23] Klaus Havelund and Grigore Rosu. Synthesizing monitors for safety properties.
In Joost-Pieter Katoen and Perdita Stevens, editors, Tools and Algorithms for
the Construction and Analysis of Systems, 8th International Conference, TACAS
2002, Held as Part of the Joint European Conference on eory and Practice of
Soware, ETAPS 2002, Grenoble, France, April 8-12, 2002, Proceedings, volume
2280 of Lecture Notes in Computer Science, pages 342–356. Springer, 2002.
[24] Stefan Jaksic, Ezio Bartocci, Radu Grosu, Reinhard Kloibhofer, ang Nguyen,
and Dejan Nickovic. From signal temporal logic to FPGA monitors. In 13.
ACM/IEEE International Conference on Formal Methods and Models for Codesign,
MEMOCODE 2015, Austin, TX, USA, September 21-23, 2015, pages 218–227. IEEE,
2015.
[25] Ron Koymans. Specifying real-time properties with metric temporal logic. Real-
Time Systems, 2(4):255–299, 1990.
[26] Orna Kupferman and Moshe Y. Vardi. Model checking of safety properties. For-
mal Methods in System Design, 19(3):291–314, 2001.
[27] Insup Lee, Sampath Kannan, Moonjoo Kim, Oleg Sokolsky, and Mahesh
Viswanathan. Runtime assurance based on formal specifications. In Hamid R.
Arabnia, editor, Proceedings of the International Conference on Parallel and Dis-
tributed Processing Techniques and Applications, PDPTA 1999, June 28 - Junlly 1,
1999, Las Vegas, Nevada, USA, pages 279–287. CSREA Press, 1999.
[28] Jin Li, David Maier, Kristin Tue, Vassilis Papadimos, and Peter A. Tucker.
No pane, no gain: efficient evaluation of sliding-window aggregates over data
streams. SIGMOD Record, 34(1):39–44, 2005.
[29] Yamin Li and Wanming Chu. A new non-restoring square root algorithm and
its VLSI implementation. In 1996 International Conference on Computer Design
(ICCD ’96), VLSI in Computers and Processors, October 7-9, 1996, Austin, TX, USA,
Proceedings, pages 538–544. IEEE Computer Society, 1996.
[30] Hong Lu and Alessandro Forin. e design and implementation of p2v, an archi-
tecture for zero-overhead online verification of soware programs. Technical
Report MSR-TR-2007-99, August 2007.
[31] Oded Maler and Dejan Nickovic. Monitoring temporal properties of continuous
signals. In YassineLakhnech and SergioYovine, editors, Formal Techniques, Mod-
elling and Analysis of Timed and Fault-Tolerant Systems, Joint International Con-
ferences on Formal Modelling and Analysis of Timed Systems, FORMATS 2004 and
Formal Techniques in Real-Time and Fault-Tolerant Systems, FTRTFT 2004, Greno-
ble, France, September 22-24, 2004, Proceedings, volume 3253 of Lecture Notes in
Computer Science, pages 152–166. Springer, 2004.
[32] Marcel Maltry. Fpga-based monitoring for stream specification languages. Mas-
ter’s thesis, Saarland University, 7 2017.
[33] Lambert Meertens. Algorithmics : towards programming as a mathematical
activity. In Towards programming as a mathematical activity. Mathematics and
computer science, pages 289–334, jan 1986.
[34] Patrick Moosbrugger, Kristin Y. Rozier, and Johann Schumann. R2U2: moni-
toring and diagnosis of security threats for unmanned aerial systems. Formal
Methods in System Design, 51(1):31–61, 2017.
[35] Patrick Moosbrugger, Kristin Y. Rozier, and Johann Schumann. R2U2: moni-
toring and diagnosis of security threats for unmanned aerial systems. Formal
Methods in System Design, 51(1):31–61, 2017.
[36] Dejan Nickovic and Oded Maler. AMT: A property-based monitoring tool for
analog systems. In Jean-Franc¸ois Raskin and P. S. iagarajan, editors, Formal
Modeling and Analysis of Timed Systems, 5th International Conference, FORMATS
2007, Salzburg, Austria, October 3-5, 2007, Proceedings, volume 4763 of Lecture
Notes in Computer Science, pages 304–319. Springer, 2007.
[37] Rodolfo Pellizzoni, Patrick O’Neil Meredith, Marco Caccamo, and Grigore Rosu.
Hardware runtime monitoring for dependable cots-based real-time embedded
systems. In Proceedings of the 29th IEEE Real-Time Systems Symposium, RTSS
2008, Barcelona, Spain, 30 November - 3 December 2008, pages 481–491. IEEE
Computer Society, 2008.
[38] Amir Pnueli. e temporal logic of programs. In 18th Annual Symposium on
Foundations of Computer Science, Providence, Rhode Island, USA, 31 October - 1
November 1977, pages 46–57. IEEE Computer Society, 1977.
