Modeling and Verification for Timing Satisfaction of Fault-Tolerant
  Systems with Finiteness by Cheng, Chih-Hong et al.
Modeling and Verification for Timing Satisfaction of Fault-Tolerant Systems with
Finiteness
Chih-Hong Cheng∗, Christian Buckl‡, Javier Esparza† Alois Knoll∗
∗Unit 6: Robotics and Embedded Systems, Department of Informatics, TU Munich, Germany
∗Unit 11: Theoretical Computer Science, Department of Informatics, TU Munich, Germany
†Fortiss GmbH, Germany
Email:{chengch,buckl,esparza,knoll}@in.tum.de
Abstract
The increasing use of model-based tools enables further use of formal verification techniques in the context of distributed
real-time systems. To avoid state explosion, it is necessary to construct verification models that focus on the aspects under
consideration.
In this paper, we discuss how we construct a verification model for timing analysis in distributed real-time systems.
We (1) give observations concerning restrictions of timed automata to model these systems, (2) formulate mathematical
representations on how to perform model-to-model transformation to derive verification models from system models, and
(3) propose some theoretical criteria how to reduce the model size. The latter is in particular important, as for the verification
of complex systems, an efficient model reflecting the properties of the system under consideration is equally important to
the verification algorithm itself. Finally, we present an extension of the model-based development tool FTOS, designed to
develop fault-tolerant systems, to demonstrate our approach.
I. INTRODUCTION
The complexity of distributed real-time systems is growing rapidly; model-based development tools are used to
accelerate the development process and increase the quality of the produced code. In addition, it is possible to integrate
formal verification as analysis technique into these tools.
Currently, the standard verification process is achieved by first translating system models into verification models,
followed by verifying relevant properties by verification engines using special algorithms. In the verification community,
researchers focus on tighter theoretical complexity bounds or computationally faster algorithms to reduce the required
time for verification. Nevertheless, if it comes to verification of complex systems, an efficient model reflecting the
properties of the system under consideration becomes essential. By efficient model, we refer to a model containing ”just-
enough” information of the system behavior regarding these properties. In fact, an inefficient modeling with irrelevant
details can simply render the verification intractable.
Within this paper, we introduce an approach for the construction of such an efficient model for the verification of
timing assumptions and constraints. The approach is presented in, but not restricted to, the context of FTOS [5], a
model-based development tool for the design of fault-tolerant systems.
In our presentation, we first introduce FTOS, mention insights regarding differences in comparison to other develop-
ment tools, and propose our two-phase verification methodology (sec. II). Then based on FTOS and timed automata [3],
we describe the model construction process, focusing on the aspects concerning expressiveness, modification, and
efficiency.
• (Expressiveness) We give observations regarding restrictions of timed automata to construct models of real-time
systems (sec. III-A, III-B, III-D); these observations are valid not only in the context of FTOS, but apply also for
other systems.
• (Modification) We formulate mathematical representations how to perform model modification to derive verification
models from system models (sec. III-C).
• (Efficiency) With the understanding of (1) complexities of verification and (2) our problem structure, we propose
some theoretical criteria regarding how to construct an efficient model, such that it is possible for existing model
checkers to generate results within reasonable time (sec. IV).
At last, we report our preliminary implementation (sec. V), mention related work (sec. VI), and conclude this paper
(sec. VII).
II. FTOS AND MOTIVATING EXAMPLES
A. Introduction to FTOS
FTOS is a model-based development tool for the development of fault-tolerant real-time systems, that alleviates
designers’ burden by offering code generation for non-functional aspects with high extensibility.
The conceptual modeling in FTOS uses multi-aspect techniques comprising four different perspectives:
• Hardware Model: The hardware model specifies the hardware used, including specifications of electronic control
units (ECUs) and the interconnecting network.
ar
X
iv
:0
90
5.
39
51
v3
  [
cs
.D
C]
  2
1 O
ct 
20
09
ECU1
ECU2
ECU3
Schd1
Schd2
Schd3
Aperiodic tasks
Giotto computation
FTOS computation
Message send via the yellow network
Message receive via the yellow network
Behavorial model Hardware Architecture
analysis and mapping
design space exploration
R1
R2
R3
G1
A1
Figure 1. Behaviorial models and architectures.
• Software Model: The underlying model of computation in FTOS shares large similarities with that of Giotto [8],
which is based on the concept of Logical Execution Times. A designer should specify tasks, ports, inputs, outputs,
and jobs.
• Fault Model: The fault model specifies the fault hypothesis of the system, which includes the set of fault
containment units (FCUs) (possible faults concerning locations, types, durations), and the set of fault configurations
(possible simultaneously activating FCUs). Examples for the fault hypothesis are:
1) A network link can have message lost (fault type: MsgLoss) with minimum interval between consecutive
occurrences equal to 3 milliseconds1.
2) A software task can produce errors (fault type: WrongResult) due to a fault within an associated sensor;
once happened, it will not be corrected unless explicitly done by the user or the fault-tolerant mechanism.
The minimum interval for the correct operation between two consecutive faults of the sensor is expected to
be 500 milliseconds2.
• Fault-tolerance Model: The fault-tolerance model specifies methods to detect errors and to repair and restore the
system.
During code generation, FTOS selects, adapts and combines pre-implemented code templates based on model features.
A detailed description of FTOS can be found in [5].
B. General Settings and Examples
The concepts presented in this paper do not only apply for FTOS, but also a range of other related projects, such as
Giotto [8] or event-driven tasks with fixed deadlines. Figure 1 shows the different models of execution. An aperiodic or
sporadic function is event driven; when such an event happens, a deadline is assigned to the task handling the event.
Giotto functions are functions that interact synchronously at macro step level (logical level), while at micro step level
the execution is asynchronous. For detailed description of Giotto and the concept of logical execution time, see [8].
FTOS functions are extensions of Giotto functions. Intuitively they are equipped with fault-tolerance abilities such that
the system can resist faults defined by the fault model. In fig. 1, three redundant copies (R1, R2, R3) are deployed on
the three machines (ECU1, ECU2, ECU3).
The figure also shows the necessity of a mapping the behavioral model (in FTOS: software model) to the architecture
model (in FTOS: hardware model). Note that in general a design space exploration is needed for finding such a mapping.
For details, we refer readers to articles regarding platform-based design [10]. Since this mapping is specified in FTOS
by the developer, our analysis can start from a given selection of hardware and software settings.
C. Verification Goals
The main property of fault-tolerant systems that needs to be verified is the ability to withstand the assumed faults.
The fault assumptions are summarized in the fault hypothesis (in FTOS: fault model) that defines faults regarding its
location, effect, and frequency.
1This minimum interval is called least time between faults (LTBF) in FTOS and is derived from the required probability of the system to withstand
the fault.
2In (1) the message loss is transient, and in (2) computation errors caused by hardware faults are permanent.
Due to OS scheduling policies, or manipulation
time
Communication Send
Communication Receive
M1
M2
M3
M ′3
Decide
of other functions, or so on
Figure 2. Internal nondeterminism due to scheduling differences.
The verification of such systems is hindered by two aspects: deadline violations and non-determinism due to e.g.
imperfect synchronization of redundant units.
1) In ordinary systems, correctness relies on the assumption that a scheduling never leads to deadline violations
(without loss of generality, we assume that deadlines specified in our model are hard). Nevertheless, in fault-
tolerant systems, the constraint can be loosened. Due to replication, a deadline violation of one unit might be
tolerated. In fact, the violation of the deadline can be categorized as an occurrence of a fault defined in the
fault model. This brings dramatic differences between fault-tolerant systems and ordinary systems, i.e., deadline
violation is feasible or acceptable provided that there exists a fault-tolerance mechanism such that the effect of
fault can be eliminated.
2) On the other hand, replication also introduces further difficulties. In ordinary Giotto systems, internal determinism
is guaranteed, meaning that two deployments having the same relative ordering in the micro step level will have
the same behavior, irrelevent of the absolute timing. Unfortunately, internal determinism will not be maintained
if no constraints are added additionally on FTOS functions. Consider fig. 2, where M1, M2 and M3 are three
deployments. The send action will broadcast messages to other machines regarding its liveness. Ideally, when no
error happens, then each machine should conceive a consistent view of the system. However, when the scheduling
of M3 changes to that of M ′3, with zero time transmission, the result will be an inconsistent view at M1 and M2.
This brings semantic incompatibility between different deployments.
To solve these problems, we thus propose the concept called deterministic assumption [6]. Intuitively, the goal is
to assume that the implementation of fault tolerance mechanisms will always provide a consistent view for all correct
machines regardless of deadline violation and scheduling issues. In practice, this will place constraints regarding the
earliest and latest arrival time between messages sent, which need to be verified.
For above purpose, we adapt a two-phase verification process in our tool FTOS-Verify:
• (Phase 1: Verification on the platform independent layer) We first assume that the deterministic assumption
holds in all deployments. Based on this assumption, we construct a verification model. The model is an abstract
machine (closed model) where injection of faults is regulated based on the fault model. The model offers precision
by revealing detailed mechanisms of fault-tolerance. Our theoretical foundation enables us to construct a concise
model with huge benefits3. For this phase, the mathematical formulation and the proof of theorems are stated in
[6]; it will not be the focus of this paper.
• (Phase 2: Validity checking of the behavior-architectural mappings) In this phase, we have to focus on two
aspects. First, we have to check whether the deterministic assumption holds in the platform. Second, we have to
check if there exists possibilities where deadlines are violated, and the violation exceeds the constraint specified
and regulated in the fault model. Note that since the correctness of the data and mechanisms are checked in the first
phase, in the latter phase only protocol checking (timing) is needed. This will be the focus and the main contribution
of the paper. For the analysis of the temporal behavior, we transform the models in FTOS to communicating timed
automata (CTA). In the following sections, we will describe our observations, relevant parts of the construction
process, and theoretical criteria for model efficiency. By using a generalized view, the results are applicable not only
in the context of FTOS, but can be used for verifying temporal behavior for generic distributed real-time systems.
III. SYSTEM MODELING AND OBSERVATIONS
We use an extended format of communicating timed automata (CTA) [4] using variables of finite domain to express the
features of the behavioral model. It is important to mention that this extended format does not change the expressiveness
of CTA.
Definition 1. A system of communicating timed automata is a tuple S = {A1, . . . ,An}, whereAi = (Qi, Vi, Ci, Synci, qi,
Jumpi, Invi) is an automaton with the following constraints.
3Our theorem states that we can construct a synchronous verification model (exponentially smaller reachable state space) provided that (1) the
deterministic assumption holds and (2) the properties are local (in-machine) LTL properties without using temporal operator X. This makes formal
verification of large systems practicable.
Figure 3. Timed automaton representing point-to-point transmission with capacity 1.
• Qi is a finite set of modes (locations).
• Vi is the set of finite-domain integer variables.
• Ci = {ci1 , . . . , cim} is the set of clock variables.
• Synci = {si1 , . . . , sin} is the set of synchronizers; each synchronizer s is of the format s ∈ {?, !} × Σ where
elements in Σ represent synchronizer symbols. Conceptually, ”?” represents receiving, and ”!” represents sending.
• qi ∈ Qi is the initial location of the automaton.
• Jumpi = Qi ×Guardsi × Synci → Qi ×Resetsi is the jump from mode to mode.
1) Guardsi is the conjunction of inequalities of the form cix ∼ k or vjx ∼ k′, where cix ∈ Ci, vjx ∈ Vj , j =
1 . . . n, k, k′ ∈ N, and ∼ ∈ {=, >,<}.
2) Resetsi is the set of assignments of the form cix := 0 or vix := k
′, where cix ∈ Ci, vix ∈ Vi, and k′ ∈ N.
• Invi is the set of mode invariants mapping a mode to a subspace of R|Ci| indicating the possible clock values to
maintain in the mode.
In the following, we summarize required components of the verification model and outline our observations.
A. Network Element with Finite Capacity
To model the network of the distributed system, an appropriate level of detail must be selected. In general, for a
network with message delay and n junction points, we have to model such a network with n(n − 1) automata to
handle point-to-point communication. Fig. 3 is the template (defined in UPPAAL [4]) of a timed automaton which
models the point-to-point transmission with storage capacity equal to 1, and one overflow location. The function
decipher(source,dest) is used to return the index of the channel.
Observation 1. For modeling of network components, only finite capacity can be reached. Furthermore, the number of
controlled locations grows exponentially as the number of allowed storage increases, because the variable delay (due
to the fault model) or the routing scheme may lead to an arbitrary ordering of arrived messages4.
B. Task Element with Finite Precision
For Giotto-like MoCs, tasks are units which perform dedicated computations. Modeling the task execution can vary
based depending on whether the applied scheduling is preemptive. Fig. 4 shows a timed automaton representing the
task execution with potential context switches. Since context switch and preemption can occur, the task should keep
the record for the remaining time (portion) to finish the task. The variable percentage represents the progress of
execution and increment reflects the minimal advance related to the time accuracy used during verification. The
constraints regarding time accuracy imply finite precision in the model.
Observation 2. Modeling of tasks can only be achieved with finite precision, since with context switch, we need to
record the portion of executed tasks. This also brings issues between expressiveness and complexity; a better accuracy
regarding the timing behavior of the context switch (with finer time unit) leads to increasing complexity of the resulting
model since it depends on the biggest integer used in the system.
C. Job Processing Element
The task of the job processing element is to manage the execution of tasks and to implement the inter-task commu-
nication. The construction in timed automata may vary due to the concrete application. However, for FTOS, a fixed
sequence of atomic actions is defined in the software model. Each atomic action can be represented by a similar model
as used for the task model described previously. The main difference is that the models of the atomic actions are linked
together instead of having a closed loop in the automaton representing the job processing element. Here we omit the
4As the synchronizer in CTA takes no time, the timing and the ordering of messages should be modeled in the network automaton.
Figure 4. A timed automaton to represent task execution with context switch.
detailed construction process for the original model, but focus on the transformation into the according verification
model. We give two motivating examples.
• In order to model the effect of faults, we need to add additional edges on the original model to represent the
occurrence of faults.
• To observe deadline violation, additional clocks that reflect the time progress since event occurrence, locations that
represent the deadlines, and jumps are required to annotate the original model.
For these purposes, we define this annotation as a sequence of edit-operations over a labeled graph [14]; this facilitates
the mathematical formulation how we transform between models.
Definition 2. Define five atomic edit actions as follows5.
1) Clock add: Given a clock variable c, λX.clock add(X, c) is an operation that adds a clock to X . Formally
speaking, given Ai = (Qi, Vi, Ci, Synci, qi, Jumpi, Invi), the result of clock add(Ai, c) is a new timed automaton
A′i = (Qi, Vi, Ci ∪ {c}, Synci, qi, Jumpi, Invi).
2) Variable add: Given a variable v, λX.var add(X, v) is an operation that adds a variable to X . Formally speaking,
given Ai = (Qi, Vi, Ci, Synci, qi, Jumpi, Invi), the result of var add(Ai, v) is a new timed automaton A′i =
(Qi, Vi ∪ {v}, Ci, Synci, qi, Jumpi, Invi).
3) Location add Given a location q and an invariant inv, where inv is the conjunction of inequalities of the form
cix ∼ k with clock cix , k ∈ N, and ∼ ∈ {=, >,<}, λX.vertex add(Ai, q, inv) is an operation that adds a location
to X with invariant condition inv. Formally speaking, let Ai = (Qi, Vi, Ci, Synci, qi, Jumpi, Invi), the result of
vertex add(Ai, q, inv) is a new timed automaton A′i = (Qi ∪ {q}, Vi, Ci, Synci, qi, Jumpi, Invi ∪ {inv}).
4) Jump add: Given two locations q, q′ ∈ Q with guard g, assignment a, and set of synchronizers s, where
a) g is the conjunction of inequalities of the form cix ∼ k or vjx ∼ k′, where cix is a clock, vjx is a variable,
k, k′ ∈ N, and ∼ ∈ {=, >,<}.
b) a is the set of assignments of the form cix := 0 or vix := k
′, where cix is a clock, vix is a variable, and
k′ ∈ N.
Let Ai = (Qi, Vi, Ci, Synci, qi, Jumpi, Invi), then the result of jump add(Ai, q, g, a, s, q′) is a new timed
automaton A′i = (Qi, Vi, Ci, Synci, qi, Jumpi ∪ {((q, g, s), (q′, a))}, Invi) by adding an arc ((q, g, s), (q′, a))
to Jumpi.
5) Jump edit: Given two locations q, q′ ∈ Q with guards g, g′, assignments a, a′, and sets of synchronizers s, s′.
a) g, g′ are conjunctions of inequalities of the form cix ∼ k or vjx ∼ k′, where cix is a clock, vjx is a variable,
k, k′ ∈ N, and ∼ ∈ {=, >,<}.
b) a, a′ are sets of assignments of the form cix := 0 or vix := k
′, where cix is a clock, vix is a variable, and
k′ ∈ N.
Let Ai = (Qi, Vi, Ci, Synci, qi, Jumpi, Invi), then the result of jump edit(Ai, q, g, a, s, q′, g′, a′, s′) is a new
timed automaton A′i = (Qi, Vi, Ci, Synci, qi, Jumpi∪{((q, g′, s′), (q′, a′))}\{((q, g, s), (q′, a))}, Invi) by chang-
ing the arc ((q, g, s), (q′, a)) to ((q, g′, s′), (q′, a′)) in Jumpi.
Note that in our formulations, we assume due to simplification reasons that the added element is not identical to any
elements in the original set, and every newly added location or jump is well defined (e.g., to add A a new location with
5Here we merely define edit actions necessary for our propositions and algorithms; more can be defined.
invariants using clock c , c should have been defined in A).
Definition 3. Let an edit sequence be eˆ = e1 ◦ e2 . . . ◦ en, where e1, e2 . . . , en are edit actions. Define the result of eˆ
on A, in symbols Ae1 ◦ e2 . . . ◦ en inductively as follows.
• A = A where  is the null sequence.
• ∀c, A (λX.clock add(X, c)) ◦ e2 . . . ◦ en = clock add(A, c) ◦e2 . . . ◦ en.
• ∀v, A (λX.var add(X, v)) ◦ e2 . . . ◦ en = var add(A, v) ◦ e2 . . . ◦ en.
• ∀q, L, A (λX.vertex add(X, q, L)) ◦ e2 . . . ◦ en = vertex add(A, q, L) ◦ e2 . . . ◦ en.
• ∀q, q′, a, s, g, A (λX.jump add(X, q, g, a, s, q′)) ◦ e2 . . . ◦ en = jump add(A, q, g, a, s, q′) ◦ e2 . . . ◦ en.
• ∀q, q′, a, g, s, a′, g′, s′,A(λX.jump edit(X, q, g, a, s, q′, g′, a′, s′))◦e2 . . .◦en = jump edit(A, q, g, a, s, q′, g′, a′, s′)
◦e2 . . . ◦ en.
Starting from the textual description of the fault model, we can construct the set of deadline requirements
⋃
i(qi, q
′
i, Ti)
for the system model S. Intuitively this means that for all runs entering the location qi, it must subsequently enter q′i
within at most Ti time units. Based on above definitions, we sketch the algorithm6 how to generate the verification
model from the system model as follows:
Algorithm: GenVerificationModelPart()
{
/* Input: Original system model S = {A1, . . . ,An} */
/* Output: Verification model Sv */
let eˆ = .
forall deadline requirements (qi, q′i, Ti), qi ∈ Qi,
/* add new clock and new variable for testing */
eˆ := eˆ ◦ λX.clock add(X, ci).
eˆ := eˆ ◦ λX.var add(X, vi).
/* qdl.vioi is the location for deadline violation */
eˆ := eˆ ◦ λX.vertex add(X, qdl.vioi , φ).
forall incoming jumps ((q, g, s), (qi, a)) of qi,
eˆ := eˆ ◦ λX.jump edit(X, q, g, s, qi, a, g, a′, s),
where a′ = a ∧ (ci := 0) ∧ (vi := 0).
endfor
forall incoming jumps ((q, g, s), (q′i, a)) of q′i,
eˆ := eˆ ◦ λX.jump edit(X, q, g, s, q′i, a, g, a′, s),
where a′ = a ∪ {(vi := 1)}.
endfor
forall reachable locations q from qi,
eˆ := eˆ ◦ λX.jump add(X, q, g, φ, φ, qdl.vioi).
where g is defined as (vi = 0) ∧ (ci > Ti).
endfor
endfor
return Sv := S eˆ. /* apply changes in eˆ */
}
For the property of deterministic assumption mentioned in section II-C, similar algorithms can be applied to annotate
clocks, locations, and jumps; the problem for checking deterministic assumption in FTOS turns to be a reachability
problem in timed automata.
D. Dispatcher
With respect to the operating system, we have to model the dispatcher explicitly. The modeled dispatcher merely
captures the scheme for the execution of threads; deadline violation, fault-tolerance or error handling is modeled in
the job processing element. Therefore, it can be used in arbitrary settings and not only in FTOS. Due to different
scheduling algorithms, the model of the dispatcher differs dramatically regarding actual verifiability. For our analysis,
we use priority based dispatchers modeling either FIFO or round-robin techniques. Nevertheless, as context switch of
tasks/threads occurs, we have the following observation.
Observation 3. Using a round-robin dispatcher leads to exponential increase of possible behaviors compared to a
FIFO-based dispatcher with the number of parallel tasks, if no assumptions on the task behavior can be made.
In summary, this section gave insight in the main components of the verification model and their construction. Besides
the job processing element, all components and related observations can be directly applied for arbitrary real-time systems.
For the job processing element, we described a generic way to use annotations to construct a model to use for verifying
6This editing algorithm is not general; for requirement (qi, q′i, Ti), qi is not reentered before entering q
′
i because actions in the job processing
element are chained.
Figure 5. A sample timed automaton representing the event agent.
the absence of deadline violations. In the next section, we point out how aperiodic behavior introduced by faults or
events can be considered.
IV. INVOCATION OF FAULTS AND APERIODIC EVENTS
To perform verification, modeling the arrival of faults or aperiodic events is necessary to establish a closed model,
and in this section we consider its effect. In FTOS, the probability of faults is implicitly reflected by the concept called
least time between faults (LTBF). In our analysis, the invocation of aperiodic tasks can be done similarly - the least time
between occurrences of events for aperiodic tasks is defined as least time between arrivals (LTBA). With LTBA or LTBF,
we can augment the original model with a timed automaton producing the event (called event agent) similar to fig. 5.
However, since LTBF (or LTBA) is an integer which might be relatively large, and the complexity of verification in timed
systems is related to this integer7, the use of LTBF (or LTBA) may hinder the practicability of model checking. Thus
we propose some methods to effectively reduce the value of LTBF (or LTBA) with equivalence criterion. For simplicity
reasons, the following theorems are all discussed using event-triggered aperiodic functions with LTBA without loss of
generality.
Proposition 1. Let system S have one FTOS function with periodic deadline T and one event-triggered aperiodic
function.
• W.L.O.G., let Aevent = ({q}, ∅, t, {!event}, q, {((q, (t > LTBAS), {!event}), (q, (t := 0))), φ}) be the timed
automaton of the event agent, where LTBAS = TS be the least time between two consecutive aperiodic events.
Let Tp be the maximal time interval for the system to finish processing the event (called deadline interval from now
on)8. If TS > Tp +T , then consider another system S′, where S′ = jump edit(S, q, (t > TS), q, (t := 0), {!event}, (t >
Tp + T ), (t := 0), {!event}), i.e., the only difference is to change LTBAS from TS to Tp + T . Then both systems are
equivalent regarding their behavior concerning deadline violation. That is, for S and S′, either they both satisfy the
deadline, or they both miss the deadline.
An intuitive argument for the bound Tp + T can be derived using fig. 6. Tasks and events influence the execution of
each other. The execution of an arriving event is influenced by the currently running task. During the deadline interval
of the event, this and all preceding tasks are influenced as well. The chain of influence can only be stopped if the
execution is decoupled. Since preceding tasks are decoupled by definition, two events with a minimal bound of Tp + T
can not influence the execution of the same task. In fig. 6, we call a time point tˆ decoupling point if two consecutive
tasks immediately before and after tˆ are not mutually influenced due to the occurrence of an event.
Proof: We consider four possible cases in S′:
1) Consider the case where in S′, it is proven that no deadline is violated. When the verification engine proofs that
the deadline is never violated with LTBAS′ = TS′ in S′, the deadline of the FTOS function in S will never be
violated because TS > TS′ ; the verification engine has already considered all cases in S.
2) Consider the case where in S′, the counter-example indicates that the i-th aperiodic task violates the deadline.
We further split the discussion in subjects whether it is the first time for S′ to process the event. Our goal is to
construct a counter-example for deadline violation in S from the counter-example in S′.
a) If i = 1, i.e., it is the first time for S′ to execute the aperiodic task, then this deadline violation can also
occur in S, since no constraints are made for the first occurrence of events in S or S′.
b) If i 6= 1, consider the (i − 1)-th aperiodic execution which does not violate the deadline. Let the time for
the coming of event (i− 1)-th be t, and let the interval between the (i− 1)-th and the i-th event be T ′. The
system should finish the (i− 1)-th processing before time t+Tp. Since T ′ > LTBAS′ = Tp+T , from time
t+Tp to t+T ′, FTOS function should finish one of its execution and proceed a new one. Let the time for the
7The reachability problem for timed automata is PSPACE-complete, i.e., the complexity is exponential to (1) the number of clocks and (2) the
maximum integer used in the system. Concerning (2), if the maximum number changes from 10 to 100, intuitively the execution time can increase
by the factor of k90, where k > 1.
8Let tarrival be the time for the event arrival. If the system can not finish processing this event within time tarrival+Tp, then the system violates
the deadline.
Tp
T T T T
Tp
The (i− 1)-th arrival of event
decoupling point
The i-th arrival of event
TS′ > T + Tp
influence
Periodic
Event-driven
function
function
tˆ
Figure 6. Illustrations for proofs of Proposition 1.
start of that cycle be tˆ. If we change the counter-example time trace such that no event has happened before
tˆ, we still get a counter-example trace in S′. This new counter-example trace is also a counter-example trace
in S.
3) Consider the case where in S′, the counter-example indicates that the FTOS function violates the deadline. Let
the time which violates the deadline be t (note that t is the multiple of T ). Let the occurrence of the nearest event
be t′ (if there exists no such event, then both S and S′ can deadlock).
a) If t− t′ ≥ Tp + T , then the event is processed before time t− T , the starting of the period which violates
the deadline. In this way, the system violates the deadline with only the existence of FTOS function, thus in
S, the deadline will also be violated.
b) If t− t′ < Tp + T , we consider whether the event is the first one being processed.
i) If yes, then the counter-example in S′ is also a counter-example in S.
ii) If not, then consider the time where the previous event occurs, and let the time be t′′. Since t′′−t′ > T+Tp,
we can find a decoupling point tˆ, where t′′ + Tp ≤ tˆ ≤ t′, where at tˆ it starts a new period. In this way,
we can perform the same technique stated in (2-b) before tˆ.
4) Consider the case where in S′, the counter-example indicates that both the i-th aperiodic task and the FTOS
function violate the deadline. Let the time which violates the deadline be t (note that t is the multiple of T ), then
the event occurs in time t − Tp. By an argumentation similar to point 3-b, a counter-example trace in S can be
established.
Remark: (1) Proposition 1 formulates the insight that previous events occurred long before can not influence the current
processing and scheduling, and therefore, are not the root cause of deadline violation. In other words, we could also
construct a counter example with a single event as root cause. (2) The introduction of faults can be viewed analogously.
For faults, in FTOS (or similar fault-tolerant systems) the value of Tp + T is much smaller then TS (LTBF), and this
brings significant advantages for construction of a model with smaller state space.
Proposition 1 can only be used for very simple systems with only one task and one event. In the following, we will
generalize the result to systems consisting of one periodic function with period T and several aperiodic functions.
Proposition 2. Let S be a system with n aperiodic functions. Each function with index i, where i = 1 . . . n, is associated
with a pair (LTBAi, Tpi) ∈ N×N describing the LTBA and deadline interval. Consider another system S′, where the
only difference is to perform the following change: if for all i = 1 . . . n, LTBAi > T (
∑
i=1...nd
Tpi
T e) + T , then we
change LTBAi to T (
∑
i=1...nd
Tpi
T e) + T . Both systems S and S′ are equivalent regarding their behavior concerning
deadline violation.
Proof: We consider the following cases.
1) If S′ does not violate the deadline, then so does S.
2) Consider the case where in S′, the counter-example indicates that the i-th aperiodic task of type j violates the
deadline.
a) If i 6= 1, let the time for the i-th and (i− 1)-th arrival of type-j events be t and t′. Our goal is to find the
decoupling point tˆ such that we can overlook all previously happened events.
Since t − t′ > T (∑i=1...ndTpiT e) + T , then within [t′, t] the periodic function is executed at least α =
(
∑
i=1...nd
Tpi
T e + 1) − 1 times. Consider the worst case where it is only executed α times. Within [t′, t],
there are α+ 1 potential decoupling points t0, . . . , tα+1, where ∀k = 1 . . . α+ 1, tk − tk−1 = T .
Due to the sparsity of events, each type of event arrives at most once within [t′, t]. For each type m, the
according event with deadline interval Tpm will overlap in worst case at most dTpiT e of these potential
decoupling points. Thus the total number of overlapped points is at most
∑
i=1...nd
Tpi
T e = α, which is less
than the number of points among {t0, . . . , tα+1}. Therefore, there exists at least one point tg ∈ {t0, . . . , tα+1}
such that it is not overlapped by any deadline interval. Thus we can set the decoupling point tˆ as tg . As
a result, we can construct an equivalent counter-example where no event has happened before tˆ. This new
counter-example trace is also a counter-example trace in S.
b) If i = 1, let the time for the i-th arrival of type-j events be t.
i) If for all type of events, the according events occurred at most once before t, then the counter-example
is also a counter-example in S.
ii) If there exists some type of events occurred more than once: let cm for type m be the total number of
events occurred in the counter-example and tcm be the latest event arrival time. Choose m
′ such that
cm > 1 and ∀m = 1 . . . n,m 6= j, tcm′ > tcm . Then we can find the decoupling point between the
(cm′ − 1)-th and (cm′)-th arrival of event with type m′, similar to the argument in case 2-a.
3) Consider cases where the deadlock happens in the FTOS function.
a) If in the counter-example no event has occurred, then both S and S′ can deadlock.
b) Otherwise, first we try to pick an event based on arguments in 2-b-ii. If possible, then the decoupling point
can be found, and the counter-example for S can be established. If selection based on 2-b-ii is not possible,
it follows the statement of 2-b-i that the counter example for S′ is also a counter-example in S.
Lastly, we discuss the most general case.
Proposition 3. Let system S have m periodic FTOS/Giotto functions with periodic deadline Tf1 , Tf2 , . . . Tfm , where
i = 1 . . .m, and n aperiodic functions. Each function with index j, where j = 1 . . . n, is associated with a pair
(LTBAj , Tpj ) ∈ N × N describing the LTBA and deadline interval. Consider another system S′, where the only
difference is to perform the following change: if for all i = 1 . . . n, LTBAi > T ′(
∑
i=1...nd
Tpi
T ′ e) +T ′, where T ′ is the
least common multiple of Tf1 , Tf2 , . . . Tfm , then we change LTBAi to T
′(
∑
i=1...nd
Tpi
T ′ e) + T ′. Both systems S and
S′ are equivalent regarding their behavior concerning deadline violation.
Proof: The main difference to the previous case is that periodic functions with different tasks might influence each
other. Potential decoupling points occur only at points in time, where all tasks start together. The proof idea is to view
multiple periodic functions as a whole by taking the least common multiple. Here we omit the detailed proof.
V. IMPLEMENTATION
For implementation, we extend the functionality of FTOS-Verify to test the applicability. The verification model is
constructed in a format acceptable by UPPAAL [4]. Note that templates in UPPAAL are not completely suitable for our
usage, since they only represent a fixed behavior with configurable parameters. Therefore, algorithms to automatically
generate timed automata based on FTOS models are needed. We have implemented our automated M2M transformation
tool using openArchitectureWare9under the Eclipse modeling framework10.
As use case, we apply the verification in the context of our balanced-rod example11, where the control functions are
replicated on three redundant machines to guarantee fault-tolerance. All components mentioned previously are generated
by our automatic conversion technique; the resulting UPPAAL system has 25 communicating timed automata. As timing
information for the different components, we use currently user-specified assumptions. An integration of WCET-analyzers
is foreseen. One desired property specified is the guarantee for the absence of deadline violation, which turns to be the
reachability property in UPPAAL.
The overall execution time varies from 1 to 25 minutes depending on the accuracy of the verification model on a Intel
2.33 GHz machine using a FIFO-based priority-driven scheduler. The memory consumption can reach up to 850Mb.
The verification of using a Round-Robin based scheduler showed to be too memory comsuming.
VI. RELATED WORK
We mention related work, but constrain ourselves in works regarding the analysis of Giotto-like systems; for techniques
applying formal verification in real-time analysis, we refer readers to the survey paper by Wang [13]. In Giotto, the
Giotto-Compiler will perform hardware mapping and apply analysis techniques to check schedulability. Many design
tools with Giotto-like MoCs apply similar approaches, for example, TDL [11] or HTL [7], but analysis techniques are
not explicitly mentioned. One interesting work comes from COMDES-II project [9], which is also based on the concept
of logical execution time; here, researchers apply model transformation from system models to verification models.
Nevertheless, as we focus on fault-tolerant systems, our work differs from the above works with the following facts.
First, we encounter a harder problem; by applying software fault-tolerance, modeling the communication between multiple
deployed units is required, and this is not required by other Giotto-like MoCs. For those MoCs, scheduling analysis
developed in real-time community could be enough without the use of model checking. Furthermore, by proposing the
9http://www.openarchitectureware.org
10http://www.eclipse.org/modeling/emf/
11For configurations, see http://www6.in.tum.de for details.
similarity between aperiodic events and fault occurrences, our theoretical criteria is powerful to reduce dramatically the
complexity of the model (not the verification algorithm). This is based on our understanding regarding constituents for
the complexity of timed verification.
VII. CONCLUSION
In this paper, we discussed the issue of constructing a model to verify timing assumptions in the context of FTOS
using timed automata. However, due to our general approach, the results can be applied to arbitrary distributed real-time
systems.
Our contribution can be summarized as follows.
1) We give observations concerning modeling of general distributed real-time systems using timed automata and
formulate our verification model construction process.
2) With the context of systems consisting of periodic and aperiodic tasks, we give theoretical criteria how to reduce
the size of the verification model, which is particularly useful for our approach. The change of the maximum
integer used in the system decreases the required time for verification with exponential scale.
3) A prototype software for the conversion process is constructed with preliminary experiments.
Our work is currently based on user-specified assumptions regarding the timing of involved components. The next
step will be the integration of WCET analyzing tools to have a faithful verification result.
Furthermore, we are investigating on approaches to separate the verification problem for control functions executed
in parallel to make our approach applicable also for large-scale applications.
REFERENCES
[1] Eclipse Modeling Framework. http://www.eclipse.org/modeling/emf/.
[2] openArchitectureWare Project. http://www.openarchitectureware.org/.
[3] R. Alur and D.L. Dill. A theory of timed automata. Theoretical computer science, 126(2):183–235, 1994.
[4] Gerd Behrmann, Alexandre David, and Kim G. Larsen. A tutorial on UPPAAL. In Marco Bernardo and Flavio Corradini, editors,
Formal Methods for the Design of Real-Time Systems: 4th International School on Formal Methods for the Design of Computer,
Communication, and Software Systems (SFM-RT’04), number 3185 in LNCS, pages 200–236. Springer–Verlag, September 2004.
[5] Christian Buckl. Model-Based Development of Fault-Tolerant Real-Time Systems. PhD thesis, Technische Universita¨t Mu¨nchen,
Oct 2008.
[6] C. Cheng, C. Buckl, J. Esparza, and A. Knoll. FTOS-Verify: Analysis and Verification of Non-Functional Properties for Fault-
Tolerant Systems. Technical report (arXiv:0905.3946), TU Munich, 2009.
[7] A. Ghosal, A. Sangiovanni-Vincentelli, C.M. Kirsch, T.A. Henzinger, and D. Iercan. A hierarchical coordination language
for interacting real-time tasks. In Proceedings of the 6th ACM & IEEE International conference on Embedded software
(EMSOFT’06), pages 132–141. ACM New York, NY, USA, 2006.
[8] Thomas A. Henzinger, Benjamin Horowitz, and Christoph M. Kirsch. Giotto: A time-triggered language for embedded
programming. In Proceedings of the First International Workshop on Embedded Software (EMSOFT’01), pages 166–184.
Springer-Verlag, 2001.
[9] X. Ke, P. Pettersson, K. Sierszecki, and C. Angelov. Verification of COMDES-II Systems Using UPPAAL with Model
Transformation. In Proceedings of the 14th IEEE International Conference on Embedded and Real-Time Computing Systems
and Applications (RTCSA’08), pages 153–160. IEEE Computer Society Washington, DC, USA, 2008.
[10] A. Sangiovanni-Vincentelli and G. Martin. Platform-Based Design and Software Design Methodology for Embedded Systems.
IEEE Design & Test of Computers, pages 23–33, 2001.
[11] R. Simmons and D. Apfelbaum. A task description language for robot control. volume 3, pages 1931–1937 vol.3, Oct 1998.
[12] W. Torres-Pomales. Software fault tolerance: A tutorial. Langley research Center, Hampton, Virginia, Technical Report, No.
NASA/TM-2000-210616, 2000.
[13] F. Wang. Formal verification of timed systems: A survey and perspective. Proceedings of the IEEE, 92(8):1283–1305, 2004.
[14] F. Wang and C.H. Cheng. Program Repair Suggestions from Graphical State-Transition Specifications. In Proceedings of the
28th IFIP WG 6.1 international conference on Formal Techniques for Networked and Distributed Systems (FORTE’08), volume
5048 of LNCS, pages 185–200. Springer–Verlag, 2008.
