Learning Concise Models from Long Execution Traces by Jeppu, Natasha Yogananda et al.
Learning Concise Models
from Long Execution Traces
Natasha Yogananda Jeppu, Tom Melham, Daniel Kroening
Dept. of Computer Science
University of Oxford, UK
natasha.yogananda.jeppu@cs.ox.ac.uk
John O’Leary
Intel Corporation
Portland, Oregon
john.w.oleary@intel.com
Abstract—Abstract models of system-level behaviour have applications
in design exploration, analysis, testing and verification. We describe a new
algorithm for automatically extracting useful models, as automata, from
execution traces of a HW/SW system driven by software exercising a use-
case of interest. Our algorithm leverages modern program synthesis tech-
niques to generate predicates on automaton edges, succinctly describing
system behaviour. It employs trace segmentation to tackle complexity for
long traces. We learn concise models capturing transaction-level, system-
wide behaviourexperimentally demonstrating the approach using traces
from a variety of sources, including the x86 QEMU virtual platform and
the Real-Time Linux kernel.
Index Terms—program synthesis, system modelling
I. INTRODUCTION
In modern system design, hardware and software components are
designed by different teams—perhaps even in different companies—
and integrated only when first silicon or a stable hardware emulation
model is available. Early on, it is hard to see how a given hardware
IP will fare under a software workload and how software behaviour
is affected by hardware design decisions. Co-design of hardware and
software to meet desired specifications is therefore very challenging.
Emulation and virtual platforms provide a way to exercise software
components in an environment that simulates the eventual hardware
behaviour. They can easily be instrumented to record execution traces
for analysis, but the traces generated are large and unstructured. So
they are difficult to correlate with a high-level view of the system
and its requirements. Concise, human-readable models that express
high-level hardware-software interactions can provide users with a
better insight into the working of the system. This can, in turn, aid
in design exploration, analysis, testing and verification applications.
Several methods have been proposed to reverse-engineer automata
that model system behaviour from execution traces [1]–[5], but the
labels on transition arcs are limited to Boolean events that are explicit
in the traces. Realizing abstract and understandable models requires
the user to know what abstract conditions are significant in the
system evolution, for example ‘the FIFO became more than half-full’,
and to instrument the system to record such conditions. Extensions
of these traditional algortihms [6], [7] generate Extended Finite-
State Machines (EFSMs) with syntactically-expressed predicates on
transitions edges, but require a substantial amount of additional trace
samples from simulation of the learned model for predicate inference.
In this paper we present a new algorithm for model learning
that employs program synthesis [8] to construct transition predicates
that are not explicit in the trace data. Program synthesis has high
computational complexity, so our algorithm uses a trace segmentation
strategy to make it scalable to long execution traces. In principle,
the algorithm should be applicable to trace data obtained from any
source: modelling, emulation, simulation, or the system itself. But in
this paper, we focus our experiments on systems modelled by virtual
platforms or directly in software, and illustrate the motivation for
our work in this setting. In Section IV, we show that our algorithm
achieves high fidelity to published data-sheet diagrams for high-level
system behaviour.
Contribution. The primary contribution of this paper is a new,
scalable method for learning finite-state models from execution trace
data that produces abstract, concise models. Our algorithm integrates
a SAT-based approach with program synthesis techniques. It learns
succinct models from traces without the additional information that
is typically required by state-merging [1]. The resulting models also
feature informative transition-edge predicates that are not explicit in
the trace. We make an algorithmic improvement to model learning
by making it scale to long traces with a segmentation approach.
II. FORMAL MODEL
We suppose that we can collect execution traces of the system we
are interested in by observing a finite set of user-defined variables,
X = {x1, . . . , xk}, over some domain D (We simplify presen-
tation by assuming all variables have the same domain.) The set
X ′ = {x′1, . . . , x′k} contains corresponding primed variables, also
over domain D. A primed variable x′i represents an update to the
unprimed variable xi at the end of a discrete step. The variables in
X could stand for concrete values directly observable in the system
or some function of such values, depending on the user’s intent.
A valuation v : X → D maps the variables in X to values in D.
An observation at time step t is a valuation of the variables at that
time, and is denoted by vt. A trace is a sequence of observations
over time; we write a trace σ with n observations as a sequence of
valuations σ = v1, v2, . . . , vn.
Our aim is to construct an automaton from a trace to represent
behaviour captured by the trace. The learned automaton is a Non-
Deterministic Finite Automaton (NFA).
Definition 1 (Non-Deterministic Finite Automaton): An NFA M
= (Q, q0,Σ, F, δ) is a state machine where Q is a finite set of states,
q0 ∈ Q is the initial state, Σ is a finite alphabet, F ⊆ Q is the set of
accepting states, and δ : Q× Σ→ P(Q) is the transition function.
In our setting, all states of the automaton are accepting states, i.e.,
our automaton rejects by running into a ‘dead end’. A symbol of
the alphabet a ∈ Σ is a function a : (X ∪ X ′) → D, i.e., a pair
of observations of the system. Let σ = v1, . . . , vn be a trace. The
symbol ai, for i = 1, . . . , n−1 is
ai(x) =
{
vi(x) if x ∈ X
vi+1(x) if x ∈ X ′
(1)
The learned automaton accepts a word w = a1, . . . , ap over Σ for
p < n if there exists a sequence of states q1, . . . , qp+1 such that:
• q1 = q0
• qi+1 ∈ δ(qi, ai) for i = 1, . . . , p.
ar
X
iv
:2
00
1.
05
23
0v
2 
 [c
s.F
L]
  2
6 F
eb
 20
20
III. MODEL LEARNING WITH PROGRAM SYNTHESIS
Our model-learning algorithm is provided in Algorithm 1. The
algorithm fits into the following overall framework:
A. Tracing infrastructure. This records traces and performs any
required pre-processing.
B. Transition predicate synthesizer. This generates transition pred-
icates from trace data using program synthesis.
C. Model construction algorithm. This iteratively constructs an
automaton from a sequence of predicates obtained from the
previous step and checks its compliance with the sequence input.
We describe these components in detail in the sections that follow.
A. Tracing Setup
We use implementations of the system of interest to obtain trace
data. For most of our experiments, execution traces are produced
simply by instrumenting source code with print statements. This
provides flexibility in getting the required information from the
simulation runs. Target components of interest are identified and trace
statements added at relevant points in the source code based on the
end goal for analysis. Traces can also be produced using any other
means, for example inbuilt tracing or logging frameworks.
B. Transition Predicate Synthesis
For observations that include non-Boolean variables, we need a
way to consolidate the information they represent into expressions
that will serve as transition predicates in our automaton. We use a
program synthesiser to generate a state transition function next(x)
that gives the value in the next state for the given variable x ∈ X . The
method used is an instance of synthesis from examples [9]. There are
many algorithms that implement this synthesis technique. We discuss
choices for the synthesis algorithm in Section VII.
Trace data are used to provide concrete samples for next(x), which
in turn serve as constraints in the synthesis tool (line 3). For example,
consider a system with a single variable x1, and let x1=1, x1=2,
x1=3, x1=4 be a trace of that system. The corresponding examples
for deriving next(x1) are
next(1) = 2, next(2) = 3, next(3) = 4
From these examples, the synthesis tool might generate next(x1) =
x1+1, which we then use to model the behaviour of the variable x1 in
our NFA. Consider another system with two variables X = {x1, x2}
and suppose the next value of x1 depends on x2:
x′1 =
{
x1 + 1 if x2 = 0
x1 − 1 if x2 = 1
(2)
One trace for this system might be this: (x1=1, x2=0),
(x1=2, x2=0), (x1=3, x2=1), (x1=2, x2=0). The correspond-
ing examples for synthesis of a definition of next(x1, x2) are
next(1, 0) = (2, 0), next(2, 0) = (3, 1), and next(3, 1) = (2, 0).
The synthesised function next(x) is used to define a predicate
‘x′ = next(x)’ that relates observations in the current and next states,
and will serve as a transition predicate in our learned automaton.
To tackle the problem of synthesis complexity for large inputs, a
sequence of these predicates is generated by feeding segments of the
trace one after the other into the predicate synthesis algorithm, using
a sliding window (lines 9–13).
Algorithm 1 Model Learning Algorithm
1: procedure GENERATEPREDICATE(σ′)
2: σ′ = vi, vi+1, . . . , vi+w−1
3: next(vk)← vk+1, for k = i, i+ 1, . . . , (i+ w − 2)
4: Synthesize next(x)
5: return ‘(x′ = next(x))’
6: end procedure
7: procedure GENERATEMODEL(σ)
8: σ = v1, v2, . . . , vn
9: w ← sliding window size
10: Divide σ into {σ1, σ2, . . . , σk}, k = (n+ 1− w)
11: for each σi do
12: pi ← GeneratePredicate(σi)
13: end for
14: P ← p1, p2, . . . , pk
15: M = (q1, p′1, q′1), (q2, p′2, q′2), . . . , (qm, p′m, q′m)
16: Divide P into {P1, P2, . . . , Pk+1−w} where
Pi = pi, pi+1, . . . , pi+w−1
17: N ← target number of automaton states
18: Generate the following C program:
19: Assume 1 ≤ qi, q′i ≤ N , for i = 1, 2, . . . ,m
20: j ← 0
21: for each Pi do
22: for y = i to (i+ w − 1) do
23: p′j ← py
24: Assume qj+1 = q′j
25: j ← (j + 1)
26: end for
27: end for
28: wrong transition ← false
29: if ∃i, j ∈ {1, . . . ,m} 3 (qi = qj ∧ p′i = p′j ∧ q′i 6= q′j) then
30: wrong transition ← true
31: end if
32: Assert wrong transition = true
33: Run CBMC with the above program as input
34: if SAT then . Required automaton M does not exist
35: N ← (N + 1)
36: go to 18
37: else . Required automaton M exists
38: Check compliance of M with P
39: M = learned candidate automaton
40: l← length of transition sequence for compliance check
41: Sl ← set of all transition sequences of length l in M
42: Pl ← set of all subsequences of P of length l
43: if Sl * Pl then . Compliance-check failed
44: Add sequences in (Sl − Pl) as additional constraints as in 29
45: go to 33
46: else . Compliance-check successful
47: return M
48: end if
49: end if
50: end procedure
C. Model Construction
Our model construction algorithm takes as input a sequence of
predicates P = p1, . . . , pk of the form just described (line 14). Each
predicate is represented by an expression (a syntax tree) over variables
in (X ∪ X ′). The automaton M to be constructed is represented
as an array, each element of which encodes a transition. The i-th
element in the array is a triple comprising the following symbolic
variables: a state variable qi for the state from which the transition
occurs, a variable p′i for the corresponding transition predicate and
a next state variable q′i for the state the system moves to (line 15).
The sequence of predicates is divided into segments using a sliding
window w, a parameter that can be tuned, and unique segments are
processed further (line 16). These predicate segments are later used
to encode transition sequences in the automaton. The parameter w
determines the input size, and consequently the algorithm runtime.
Choosing w = 1 will not capture any sequential behaviour but only
ensures that all trace events appear in the automaton. For model
learning, we would like to choose a value for w that results in a
small input size but is not trivial (w = 1). For our experiments
2
(a)
q1start
q2 q3 q4
CR ENABLE SLOT
CR DISABLE SLOT
CR RESET DEVICE
CR ADDR DEV
(BSR=0)
CR CONFIG END
CR STOP END
CR CONFIG END
(b)
Fig. 1: USB Slot state machine provided in (a) Intel datasheet [10] and (b) model learnt by our framework.
we performed multiple runs of the algorithm, randomly selecting a
different value for w between 1 < w ≤ |P | for each run and obtained
the same automaton in all scenarios. The strategy we adopt for our
experiments is to fix a window length w = 3 that is small, and yet
can capture interesting trace patterns, to ensure quick results. The
result of segmentation is the set of all unique subsequences of P
of length w. Segmentation significantly improves runtime, especially
for large traces, by leveraging the presence of repeating patterns in
the trace. A detailed discussion of the need for segmentation and its
effect on scalability is given in Section V.
To construct the model, we search systematically for an N -state
NFA whose behaviours include all of the unique segments previously
identified and that has at most one transition from any state labelled
with any given predicate. For a given N , we hypothesize that no such
automaton exists and use a model checker to check the hypothesis.
This is done by first encoding the hypothesis as a C program (lines
18–32). For the hypothesis check we use C Bounded Model Checker
(CBMC) [11] and hence encode it as a C program. We set the number
of automaton states by restricting the state variables of M to take
values between 1 and N (line 19). Lines 21–27 ensure that the
automaton always includes the corresponding transition predicates in
the sequence they appear in the segments of P . A wrong transition
flag, used to detect ‘invalid transitions’ in the automaton, is set to
true when the automaton contains transitions (q, p, q′) and (q, p, q′′)
with q′ 6= q′′ (lines 29–31).
The program, together with the assertion wrong transition =
true (line 32), is fed to CBMC (line 33). The assertion is a means of
querying the model checker to check if the aforementioned assertion,
and hence the hypothesis, holds. If SAT (line 34), it indicates that for
all assignment of values in the range 1 to N to state variables qi and
q′i of M, the wrong transition flag is always true and hence the
assertion is ‘satisfied’. This implies that there is no N -state automaton
that meets our constraints; in this case we increment N and repeat
the search (lines 34–36). We begin model construction with N = 2
and increase the number of states by 1 if such an automaton cannot
be learned. This ensures that we learn the smallest automaton that
contains all subsequences of P of length w and meets all constraints.
A counterexample to the assertion is an assignment of values
to state variables of M that encodes an N -state automaton that
meets the desired conditions (line 39). Once CBMC has constructed
a candidate model, we check its compliance with P by looking
for invalid transition sequences in M (lines 38–43). A transition
sequence is said to be invalid if it is not a subsequence of P . We
check if all transition sequences in the model of a given length l
are subsequences of P (line 43). The parameter l can be tuned to
change the degree of generalisation. However, a higher value for l
implies tighter constraints on the model moving towards a more exact
representation. It is known that learning exact automata from trace
data is NP-complete [12]. Hence, we have used l = 2 to ensure that it
is not too complex for the model checker to solve but at the same time
does not over-generalise to fit the trace. We encode invalid sequences
as additional constraints on the automaton and repeat the search
(lines 43–45). This refinement loop gleans further information from
sequence P . The algorithm returns the target N -state automaton M
if such an automaton exists and the compliance check is successful.
IV. BENCHMARKS
We demonstrate our model learning approach from execution traces
for six examples. We compare our approach against the traditional
state-merge algorithm and provide experimental evidence to demon-
strate scalability of our algorithm. Four of the six benchmarks use
the QEMU virtual platform to emulate an x86 system, including the
hardware components. The virtual platform runs a full CentOS Linux
distribution and is given an application to exercise system behaviours
of interest. The other two benchmarks are artificial and enable us to
benchmark particular aspects of our method.
USB xhci Slot State Machine The Extensible Host Controller
Interface (xhci) specifies a controller’s register-level operations for
USB 2.0 and above. In this example we look specifically at slot-
level operations that take place when we access a virtual USB
storage device as implemented in QEMU x86 platform emulation.
The framework learns the automaton given in Fig. 1b resembling the
Intel datasheet diagram in Fig. 1a.
It is worth noting that the model learning algorithm is able to
generate an accurate representation of system behaviours that are
exercised under a given application load. Some transitions in Fig. 1a
do not appear in the learned model because either QEMU does not
implement those scenarios or that the application load does not drive
the system into those scenarios. The models generated thus provide
valuable coverage information.
QEMU USB Interface Emulation We use the same setup as above
but record all interface event exchanges that take place when a virtual
USB storage device is attached to the virtual platform. The resulting
trace records the series of ring fetch and ring write operations between
the command ring and event ring of the xhci protocol.
Our algorithm learned a concise seven state automaton (Fig. 3)
while the smallest model generated by state-merge approaches had
91 states (Table II).
3
(a) Model generated by State Merge
q1start
q2q3
q4
q5q6
read
x′ = x− 1
read
write
x′ = x+ 1
write
reset
write
reset
x′ = 0
(b) Model learnt by our framework
Fig. 2: QEMU Serial I/O Port
q1start q2
q3
q4
q5
q6q7
xhci write
ErTransfer
ErCC, ErPSC
CCSuccess
CCSuccess
xhci ring fetch
TRData, TRSetup
TRBReserved
xhci ring fetch
xhci write
CrAD, CrCE, CrES
TRNormal, TRStatus
Fig. 3: Model of USB interface learnt by our framework.
q1start
q2q3
op′ = op+ ip
reset
op′ = 0
(op = 5 ∧ ip = 1)∨
(op = −5 ∧ ip = −1)
op′ = op
Fig. 4: Model of an anti-windup integrator learnt by our framework.
QEMU Serial I/O Port We used QEMU’s x86 emulation of a serial
I/O port and recorded changes in the queue length. The trace contains
(numerical) queue length data along with (Boolean) read, reset and
write events on the queue.
Our algorithm was able to generate expressions for transition
predicates as given in Fig. 2b. We were unable to take the queue
to its full capacity due to very quick read-writes and frequent resets.
Counter We trace a simple program that counts from 1 up to a thresh-
old T , which we set to 128; after it reaches the threshold it counts
back down to 1. This is repeated N times. We observe the value of
the counter. The synthesis component of our algorithm automatically
generates the expected transition predicates (x′ = x+1), (x ≥ 128),
(x ≤ 1) and (x′ = x− 1) using values from the trace (Fig. 5).
1start
2
3
4
(x ≥ 128)
x′ = x+ 1
x′ = x− 1(x ≤ 1)
x′ = x− 1
x′ = x+ 1
Fig. 5: Model of a counter with threshold 128 learnt by our frame-
work.
This benchmark is particularly interesting as it has constants, like
the value of T , in the predicates. The ability to automatically produce
these constants depends on the synthesis tool used and the approach
to synthesis. We discuss this in detail in Section VII.
Integrator Control applications frequently track an integral of an
input signal. We implement an anti-windup integrator where the
computed output op is saturated at predefined thresholds, 5 and −5.
The input ip is restricted to take values {1, 0,−1}. The trace contains
valuations of (ip, op) pairs at discrete time steps.
The algorithm generates complex transition predicates (op′ = op+
ip), (op′ = op) and (op = 5∧ ip = 1)∨ (op = −5∧ ip = −1). The
transitions on (op′ = op+ ip) encodes integrator behaviour outside
saturation. In the learned model (Fig. 4), transitions on (op = 5 ∧
ip = 1) ∨ (op = −5 ∧ ip = −1) are always followed by transitions
on (op′ = op); hence, accurately capturing behaviour at saturation.
Real Time Linux Kernel We generated an automaton describing
the behaviour of threads in the Linux PREEMPT RT kernel on a
single core system. This is motivated by the work in [13], [14]
where hand-drawn models of the kernel are used as monitors for
runtime kernel verification. For this example, we used the built-in
Linux tracing infrastructure ftrace to trace scheduler related calls
made by the thread under analysis, as described in [14]. We used
the Linux PREEMPT RT kernel version 5.0.7-rt5 on a single core
QEMU emulated x86 machine for our experiments.
4
q1start
q3
q2
q4
q5
q7
q8
q6
sched switch in
sched waking
sched waking
sched switch in
sch
ed
wak
ing
sched switch suspend
sched switch preempt
sched entry
set state sleepable,
set need resched
set state sleepablesched waking
set state runnableset state sleepable
sched entry
set need resched
set need resched
Fig. 6: Model of RT-Linux Kernel Thread Scheduling learnt by our framework
26 27 28 29 210 211 212 213 214 215
0
10
102
103
104
105
timeout
Trace Length
R
un
tim
e
(s
ec
)
Segmented
Non-segmented
Fig. 7: Graph plot (log–log plot) comparing runtime for segmented
and non-segmented trace input for the integrator example.
Initial attempts at modelling thread behaviour with our algorithm,
using the pi stress tests from the rt-tests suite as system load, revealed
that some states in the hand-drawn model provided in [14] are not
covered by the given load. On running an additional kernel module
to cover these corner cases, we obtain the automaton in Fig 6. This
experiment provides evidence in support of potentially using the
models learned by our algorithm for functional test coverage analysis.
V. THE BENEFIT OF TRACE SEGMENTATION
To learn models from execution traces we require efficient and scal-
able mechanisms for mining useful information from large amounts of
trace data. More often than not, execution traces of a system contain
recurring patterns, which we exploit to speed up model learning.
To evaluate the benefit of our segmentation technique, we give the
results of a runtime comparison of the algorithm for segmented and
non segmented trace inputs for all six examples (Table I). We observe
that the segmentation enables our algorithm to scale: Fig. 7 is a plot
of the runtime against trace length for exponentially increasing trace
lengths for the integrator example. Segmentation breaks down a large
problem into multiple small instances that have manageable runtime.
We leverage the presence of trace patterns to significantly reduce
execution time as it is sufficient to process repeating segments once.
Example N Trace Length Full Trace (s) SegmentedTrace (s)
USB Slot 4 39 14.1 9
USB Attach 7 259 2249.5 915.4
Counter 4 447 249.1 95.9
Serial I/O Port 6 2076 23590.5 60.2
Linux Kernel 8 20165 >16 hours 516.3
Integrator 3 32768 >16 hours 3495.6
TABLE I: Runtime comparison for segmented and non-segmented
trace input. For a fair comparison, we begin learning with number of
states equal to N .
VI. COMPARISON WITH STATE MERGE ALGORITHMS
State merge algorithms are the established approach to model
generation from traces. Traces are first converted into a Prefix Tree
Acceptor (PTA). Model inference techniques are used to identify pairs
of equivalent states to be merged in the hypothesis model. One of the
most popular and accurate inference techniques is Evidence-Driven
State Merging (EDSM) [2]. The MINT (Model Inference Technique)
tool [5], [15], implements a variant of EDSM using data classifiers
to classify trace events based on next event. States for which the
classifier predicts the same next event are merged. It also provides
support for the traditional kTails approach to state merge [1].
A runtime comparison of MINT against our approach reveals
that the state-merge algorithm is significantly faster (Table II) but
generates large automata that are difficult to comprehend, as shown
in Fig. 2a. By contrast, our framework learns models that are much
more succinct (Fig. 2b) and capture system behaviour accurately. The
MINT tool was unable to produce models from long traces of length
>20,000 for the Linux kernel and integrator examples whereas our
approach successfully produced concise automata in both cases.
Example
Trace
Length
Runtime (s) Number of States
State
Merge
Model
Learning
State
Merge
Model
Learning
USB Slot 39 8.7 14.5 6 4
USB Attach 259 35.1 3615.1 91 7
Counter 447 12.1 98.6 377 4
Serial I/O Port 2076 28.6 137.4 28 6
Linux Kernel 20165 ≈ 5 h 4173.6 no model 8
Integrator 32768 ≈ 5 h 3497.2 no model 3
TABLE II: Runtime analysis of state-merge vs. model learning.
5
VII. DISCUSSION OF PROGRAM SYNTHESIS ENGINES
Virtually all tools that perform program synthesis implement a
form of Counter Example Guided Inductive Synthesis (CEGIS) [8].
The program that is generated is usually required to conform to
a grammar, which is given as part of the problem description.
Tools that require this grammar implement Syntax-Guided Synthesis
(SyGuS) [16]. We experimented with two program synthesis tools
for generating transition predicates for the automaton: CVC4 version
1.6 [17], [18], which by default employs SyGuS, and fastsynth,
which is based on the work done in [19].
The SyGuS-based approach requires a grammar for the program.
The key effort is to determine the constants that are required; they
have to be adjusted manually for every model. Fastsynth also
implements CEGIS but does not rely on a user-specified grammar
to restrict the search space. Fastsynth ignores any grammar given as
part of the problem and produces the smallest function that satisfies
the constraints. Any constants that may be required are generated
automatically.
CVC4 also implements an alternative algorithm that does not
require syntax guidance; however, that produces trivial solutions.
For example, given the trace sequence 1, 2, 4, 8, CVC4 generates
(ite (= x 4) 8 (ite (not (= x 2)) 2 4)) whereas fastsynth produces the
expression x + x, which is a better fit for our problem. The type
of transition predicates synthesised depends on the ability of the
program synthesis tool and the approach to synthesis. A suitable
synthesis tool can be chosen based on the target models we wish
to obtain and the application domain.
VIII. RELATED WORK
Manually creating abstract system models is time-consuming and
error-prone, and this has prompted numerous research efforts aimed
at automated model learning. The most common approach to au-
tomatically reverse engineer models from execution traces is state
merging [1]. The process involves converting traces into prefix tree
acceptors (PTA), and then applying model inference techniques to
determine which states are to be merged. In the traditional kTails
approach two states in the PTA are merged if they are k-equivalent.
The parameter k is used to change the degree to which the model
generalises. A variant of the algorithm [5] additionally uses data
classifiers to determine state equivalence.
Conventional automata learning approaches are partial because
they fail to model how system variables change during execution. An
extension of the state merge algorithm [7], generates “computational”
state machines. Data update functions over transitions are automat-
ically generated using genetic programming. This method requires
additional trace data, over and above the trace data used to generate
the initial model, for transition predicate inference. The GK-Tails [6]
algorithm integrates Daikon [20] with the kTails approach to derive
transition predicates for Extended Finite State Machines (EFSM) that
represent software behaviour. The type of expressions generated is
however restricted. kTails based algorithms use instances of only
positive behaviour and hence run the risk of over-generalising [12].
A popular model inference algorithm, Evidence-Driven State
Merge (EDSM) [2], overcomes the problem of over-generalisation by
using both positive and negative instances of behaviour to determine
equivalence of states to be merged based on statistical evidence. In an
extension of this work [4], finite automata inference is mapped to a
graph-colouring problem based on the red-blue EDSM framework [2].
Models are generated by converting the problem into SAT and using
state-of-the-art SAT solvers to get an optimal solution.
To avoid over-generalisation in the absence of labelled data, the
EDSM algorithm was improved to incorporate inherent temporal
behaviour in the models [3], [21]. Models are checked against LTL
properties to validate state merges as they are encountered. In an
attempt to model and verify software systems [22], state machines
describing software behaviour are generated by checking a hypothet-
ical, manually drawn model against the code. The user specifies a set
of states and state invariants which are translated into relevant pre
and post conditions in the code. State merge based algorithms do not
focus on producing the most succinct models but rather produce a
good enough approximation that conforms to the trace [23].
SAT-based approaches to model generation have thus gained pop-
ularity due to their ability to produce exact state machines [23],
[24]. Similarly, several algorithms have been developed that use SAT
together with state merge to generate automata from positive and
negative traces [4], [25], [26]. In general, these methods work by first
representing the problem using Boolean variables and generating a
Boolean formula that constrains them. A SAT checker then generates
a hypothesis model, which is verified using LTL properties of the
system. The SAT-based approach has been put to practice to construct
plant models as Moore machines using behavioural instances, with
LTL properties used as constraints [25], [26].
A classic automata learning technique, Angluin’s L* algo-
rithm [27], employs a series of equivalence and membership queries
to an oracle, the results of which are used to construct the automaton.
When the trace does not explicitly contain transition predicates, L*
fails to learn behaviour seen in the data. The absence of an oracle
often restricts the use of this algorithm for abstracting large systems.
IX. CONCLUSION AND PROSPECTS
In this paper we have outlined a novel scalable program synthesis
based approach to learn models from long execution traces. The
models produced are concise and accurately represent the system’s
behaviour. Our approach can handle traces that contain more than
just Boolean events by synthesising expressions for system variable
state predicates. We have compared our approach with state-merge
algorithms for a range of benchmarks and evaluated scalability of our
algorithm. Our abstract models have several potential applications:
they can summarize which aspects of system behaviour have been
covered by a suite of tests, they provide starting points for model-
based test generation, perhaps to close coverage holes, and they could
be used as candidate inductive invariants [28], which, in turn, could
be used to prove properties of the system.
Going forward, we wish to look at these potential applications
and address the question of how to efficiently exercise the system to
produce relevant traces. We are particularly interested in its utility in
invariant synthesis for property proving using the models as candidate
invariants in the inductive invariant synthesis refinement loop.
Although we demonstrate our approach on systems given as virtual
platforms, we wish to explore its value in other domains as well.
X. ACKNOWLEDGEMENTS
This research was supported in part by a grant from the Semi-
conductor Research Corporation, Task 2707.001 and the Jason Hu
scholarship.
6
REFERENCES
[1] A. W. Biermann and J. A. Feldman, “On the synthesis of finite-state
machines from samples of their behavior,” IEEE Trans. Comput., vol. 21,
no. 6, pp. 592–597, Jun. 1972.
[2] K. J. Lang, B. A. Pearlmutter, and R. A. Price, “Results of the Abbadingo
One DFA learning competition and a new evidence-driven state merging
algorithm,” in Grammatical Inference. Springer, 1998, pp. 1–12.
[3] N. Walkinshaw and K. Bogdanov, “Inferring finite-state models with
temporal constraints,” in 2008 23rd IEEE/ACM International Conference
on Automated Software Engineering, Sep. 2008, pp. 248–257.
[4] M. J. H. Heule and S. Verwer, “Software model synthesis using
satisfiability solvers,” Empirical Software Engineering, vol. 18, no. 4,
pp. 825–856, Aug 2013.
[5] N. Walkinshaw, R. Taylor, and J. Derrick, “Inferring extended finite
state machine models from software executions,” Empirical Software
Engineering, vol. 21, no. 3, pp. 811–853, Jun 2016.
[6] D. Lorenzoli, L. Mariani, and M. Pezze`, “Automatic generation of
software behavioral models,” in International Conference on Software
Engineering (ICSE), 2008, pp. 501–510.
[7] N. Walkinshaw and M. Hall, “Inferring computational state machine
models from program executions,” in 2016 IEEE International Confer-
ence on Software Maintenance and Evolution (ICSME), Oct 2016, pp.
122–132.
[8] S. Gulwani, A. Polozov, and R. Singh, “Program synthesis,” in Foun-
dations and Trends in Programming Languages, vol. 4. NOW, August
2017, pp. 1–119.
[9] S. Gulwani, “Synthesis from examples,” in 3rd Workshop on Advances
in Model-Based Software Engineering (WAMBSE), 2012.
[10] Intel, eXtensible Host Controller Interface for Univer-
sal Serial Bus (xHCI), 2017 November. [Online]. Avail-
able: https://www.intel.com/content/dam/www/public/us/en/documents/
technical-specifications/extensible-host-controler-interface-usb-xhci.pdf
[11] E. Clarke, D. Kroening, and F. Lerda, “A tool for checking ANSI-C
programs,” in Tools and Algorithms for the Construction and Analysis
of Systems (TACAS), vol. 2988. Springer, 2004, pp. 168–176.
[12] E. M. Gold, “Complexity of automaton identification from given data,”
Information and Control, vol. 37, pp. 302–320, 1978.
[13] D. de Oliveira, T. Cucinotta, and R. S. de Oliveira, “Efficient formal
verification for the Linux kernel,” in Software Engineering and Formal
Methods. Springer International Publishing, 2019, pp. 315–332.
[14] D. Bristot de Oliveira and S. Sant’anna, “Modeling the behavior of
threads in the PREEMPT RT Linux kernel using automata,” in Proceed-
ings of the Embedded Operating System Workshop (EWiLi), 11 2018.
[15] N. Walkinshaw, MINT framework Github repository, 2018. [Online].
Available: https://github.com/neilwalkinshaw/mintframework
[16] R. Alur, R. Bodik, G. Juniwal, M. M. K. Martin, M. Raghothaman,
S. A. Seshia, R. Singh, A. Solar-Lezama, E. Torlak, and A. Udupa,
“Syntax-guided synthesis,” in Formal Methods in Computer-Aided De-
sign (FMCAD), 2013, pp. 1–8.
[17] CVC4. [Online]. Available: http://cvc4.cs.stanford.edu/web/
[18] C. Barrett, C. L. Conway, M. Deters, L. Hadarean, D. Jovanovic, T. King,
A. Reynolds, and C. Tinelli, “CVC4,” in Computer Aided Verification
(CAV), vol. 6806. Springer, 2011, pp. 171–177.
[19] A. Abate, C. David, P. Kesseli, D. Kroening, and E. Polgreen, “Coun-
terexample guided inductive synthesis modulo theories,” in Computer
Aided Verification. Springer, 2018, pp. 270–288.
[20] M. D. Ernst, J. Cockrell, W. G. Griswold, and D. Notkin, “Dynamically
discovering likely program invariants to support program evolution,” in
International Conference on Software Engineering (ICSE). ACM, 1999,
pp. 213–224.
[21] N. Walkinshaw, K. Bogdanov, M. Holcombe, and S. Salahuddin, “Re-
verse engineering state machines by interactive grammar inference,” in
Proceedings of the 14th Working Conference on Reverse Engineering,
ser. WCRE ’07. IEEE Computer Society, 2007, pp. 209–218.
[22] W. Said, J. Quante, and R. Koschke, “Reflexion models for state machine
extraction and verification,” in 2018 IEEE International Conference on
Software Maintenance and Evolution (ICSME), Sep. 2018, pp. 149–159.
[23] V. Ulyantsev, I. Buzhinsky, and A. Shalyto, “Exact finite-state ma-
chine identification from scenarios and temporal properties,” CoRR, vol.
abs/1601.06945, 2016.
[24] V. Ulyantsev and F. Tsarev, “Extended finite-state machine induction
using SAT-solver,” in International Conference on Machine Learning
and Applications and Workshops, 2011, pp. 346–349.
[25] I. Buzhinsky and V. Vyatkin, “Automatic inference of finite-state plant
models from traces and temporal properties,” IEEE Trans. Ind. Informat.,
vol. 13, no. 4, pp. 1521–1530, Aug 2017.
[26] I. Buzhinsky and V. Vyatkin, “Modular plant model synthesis from
behavior traces and temporal properties,” 2017 22nd IEEE International
Conference on Emerging Technologies and Factory Automation (ETFA),
pp. 1–7, 2017.
[27] D. Angluin, “Learning regular sets from queries and counterexamples,”
Inf. Comput., vol. 75, no. 2, pp. 87–106, Nov. 1987.
[28] E. M. Clarke, O. Grumberg, D. Kroening, D. Peled, and H. Veith, Model
Checking, 2nd ed. MIT Press, 2018.
7
