PALS: Physically Asynchronous Logically Synchronous Systems by Sha, Lui et al.
PALS: Physically Asynchronous Logically Synchronous Systems
Lui Sha1, Abdullah Al-Nayeem1, Mu Sun1, Jose Meseguer1, Peter O¨lveczky2,
1University of Illinois at Urbana Champaign, 2University of Oslo
Abstract
In networked cyber physical systems real time global
computations, e.g., the supervisory control of a flight
control system, require consistent views, consistent
actions and synchronized state transitions across net-
work nodes in real time. This paper presents a real time
logical synchrony protocol, Physically Asynchronous
Logically Synchronous (PALS), to support real time
global computation. Under the PALS protocol, engi-
neers design and verify applications as if all the dis-
tributed state machines were driven by a single global
clock. The PALS protocol is optimal in the sense that
1) the bound on the periods of the real time global
computation, such as the supervisory controller, is
the shortest possible, and 2) the message overhead in
achieving logical synchrony is minimal.
Acknowledgement. Steven P. Miller and Dar-
ren Cofer have collaborated with us closely in this
work. Min Young Nam, Peter Feiler and Dionisio
de Niz have helped us greatly in AADL related
challenges. Xiaokang Qiu and Artur Boronat con-
tributed to the translation of AADL to Maude.
This work is sponsored by Rockwell Collins Inc.,
the Office of Naval Research , the National Science
Foundation, Lockheed Martin Corporation, the Re-
search Council of Norway, and the Software Engi-
neering Institute.
1. Introduction
Networked real time control systems have both
global and local computations. For example, the su-
pervisory control of a flight control system is a real
time hybrid control that 1) periodically adjusts the
setpoints of engine speeds and control surface an-
gles, and 2) performs discrete control such as mode
changes, e.g., changing the primary controller in a
dual redundant flight control system. The supervi-
sory control is a global computation because during
mode changes, the views, actions and state tran-
sitions of the distributed state machines must be
consistent with each other. Local computations may
use a combination of local data and/or a subset of
globally consistent views and commands provided
by the supervisory control. However, local compu-
tations do not need to synchronize with the others.
For example, in a flight control system, servo con-
trol at each node is a local computation, which is
solely a function of the local tracking errors with re-
spect to its setpoint. For example, consider a com-
mand given to increase the propulsion power of both
left and right engines by 10%. If the left engine has a
mechanical problem and loses power, the right en-
gine will still execute the command of increasing
power by 10%. It is up to the supervisory control to
re-adjust the setpoints and regain control.
In the aviation community, networked real time
systems are known as Globally Asynchronous Lo-
cally Synchronous (GALS) systems, because local
tasks are driven by the same clock. As a re-
sult, synchronous task interactions within a node
can be accomplished easily. On the other hand,
skews between different nodes’ clocks can only
be bounded but not eliminated. When inter-
actions between nodes are directly driven by
their local clocks, the resulting interactions be-
come asynchronous. Discrete supervisory con-
trols, such as commands for mode changes, are
very sensitive to asynchronous interactions. Sup-
pose that the two clocks of a dual redundant
control system have a bounded skew of  with re-
spect to a global clock. As a result, one subsystem
can be in state j but the other lags by 2 and re-
mains in state j − 1. Hence, one could receive
the command in state j while the other receiv-
ing the command in state j − 1, leading to poten-
tial divergence between the replicated machines. An
aircraft has hundreds of computers networked to-
gether. Different clocks have somewhat different
relative skews, and there can be tens to even hun-
dreds of concurrent discrete commands in the
hierarchical control of a complex modern flight con-
trol system.
To verify discrete control logics in the presence of
asynchronous interactions, a model checker has to
examine all possible state transitions under all pos-
sible clock skew combinations tick by tick. This cre-
ates a combinatorial explosion of interaction state
space. Comparing a synchronous system against an
asynchronous system for dual redundant flight guid-
ance system, Miller et. al. made the following obser-
vation: “the properties themselves are more difficult to
state, were weaker than could be achieved in the syn-
chronous case, and required considerable complexity
to be added to the model to ensure that even the weak-
ened properties were true” [11].
Protocols to provide distributed processes with
consistent views in general purpose distributed com-
puting have been an active area of research. Bir-
man and Joseph [1] first introduced the process
group abstraction to achieve fault tolerant virtual
synchrony for general purpose computation. Guo,
Vogels and Renesse introduced a light weight ver-
sion [2]. They reported that their light weight ver-
sion achieved a one-way synchronization latency of
100 msec over a group of 500. Pereira, Rodrigues
and Olivera [3] exploited application-level semantics
to relax some virtual synchrony constraints. Robin-
son and Schmid [6] developed an asynchronous
bounded cycle model to support logically locked
step actions in distributed computation. Delporte-
Gallet and Fauconnier [4] presented a soft real time
atomic broadcast protocol where each message has
an explicit termination deadline, so that the atomic
broadcast will either be done in time or aborted.
Tarek, Shaikh, Jahanian, and Shin [7] developed a
light weight fault tolerance multicast and member-
ship service for real time process groups using a log-
ical ring for concurrency control. When an applica-
tion needs to send real time messages, it presents
the message with timing constraints to an admis-
sion controller to perform online schedulability anal-
ysis. Real time messages that can be scheduled will
be admitted. Otherwise, they will be rejected.
In a typical progress group approach, synchrony
management mechanism and fault tolerance mech-
anism are bundled together. This is a good idea
for many applications. However, in safety critical
CPS applications such as avionics, a system has
subsystems with different levels of reliability re-
quirements. For example, avionics certification stan-
dard DO178B defines 5 design assurance levels [16].
Hence, we separate our real time synchrony mech-
anism from fault tolerance mechanisms, so that de-
signers can combine real time synchrony mechanism
with preferred fault tolerance mechanisms to meet
different reliability requirements.
The objective of PALS (Physically Asynchronous
Logically Synchronous) protocol is to provide the
optimal real time logical (virtual) synchronization
protocol, under which a machine Mi’s views, ac-
tions, and state transitions driven by local clocks
with bounded skews are identical to Mi’s views, ac-
tions and state transitions driven by perfectly syn-
chronized clocks (synchronized perfectly with an
idealized global clock).
Under the PALS design process, engineers first
design and verify their application as if it is a syn-
chronous system. Each component is then allocated
to distributed nodes driven by clocks with bounded
skews. When the PALS protocol is followed, the log-
ical behavior of this physically asynchronous system
is identical to that of the synchronous system.
The PALS protocol based design can be specified
in a formalized version of AADL [12], an architec-
ture analysis and description language. The AADL
description can be automatically translated into
Real Time Maude [14] to perform model checking. If
the design passes the check, engineers will bind the
AADL logical design to an AADL hardware speci-
fication of a networked system with bounded clock
skews between nodes. The resulting design is then
checked if the PALS protocol is followed. Due to the
page limitation, the formal model of PALS architec-
ture pattern, and the design tools will be published
separately. This paper presents the PALS protocol.
2. The PALS Protocol
We first define the global computation model.
Next, we examine the properties of global computa-
tions driven by perfectly synchronized clocks with
period T . We then show that these behaviors are
logically identical to that of the same system driven
by clocks with bounded skews and period T under
the PALS protocol, in the sense that they have the
identical state transitions and identical inputs and
outputs.
G1: Local Clocks for Global Computation.
Each state machine Mi engaged in global compu-
tation is driven by a local clock Ci with the same
period T . The global clock time is denoted as t.
Clock Ci is said to be at its jth period, denoted as
Ci = j, if the global time t satisfies the constraint,
↑ (Ci = j) ≤ t <↑ (Ci = j + 1); where ↑ (Ci = j)
2
is the rising edge of clock Ci, when it just enters its
jth period.
All the local clocks used for global computation
are synchronized with the perfect global clock with
clock skews of at most . If  = 0, all clocks are per-
fectly synchronized. A system driven by perfectly
synchronized clocks and the same system driven by
the perfect global clock have the same behaviors.
Note that is important to ensure that the clock val-
ues are not only monotonic but also avoid large
jumps. When a clock is ahead/behind, it should
be corrected by decreasing/increasing its rate of
progress. A clock value that goes backwards or has
large jumps generates serious errors in the compu-
tation of velocity and acceleration.
Finally, in addition to global computation, a node
can also perform local computations. Local compu-
tations are modeled by different state machines. The
clocks used for local computations do not need to
be synchronized with the clocks for global compu-
tation. In practice, it is convenient to synchronize
a set of local clocks at some rate and then derive
all other clock values. For example, we may choose
to synchronize all the 100 Hz real time clocks and
then derive all the other clock rates from this 100
Hz clock, including those used for global computa-
tion.
G2: Real Time Network. The network has a
network queueing (scheduling) delay q bounded by
0 < qmin ≤ q ≤ qmax and a network transmission
delay µ bounded by 0 < µmin ≤ µ ≤ µmax.
G3: Real Time Machine. Messages arriving at
real time machine Mi during Mi’s jth clock period
are buffered. At ↑ (Ci = j + 1), Mi reads the mes-
sages from the buffer, carries out the computation,
transitions to the next state, and sends output mes-
sages. The task completion time α, including real
time scheduling, computation, and I/O is bounded
by 0 < αmin ≤ α ≤ αmax. Since state transitions
are driven by the clock, we say that a state machine
Mi is at its jth state, when its clock is at its jth pe-
riod.
To simplify notation, we assume that a (global
computation) state machine sends and receives mes-
sages from and to every machine at each state.
When there is no physical message, then it is mod-
eled as sending/receiving messages with the “null”
value that has no effect on the computation.
A CPS system may interact with the external en-
vironment. When the environment sends a message
to two replicated state machines, the network de-
lays may be different. This can result in one ma-
chine receiving the data at clock period j, while the
other receives it at clock period j + 1, even if their
clocks are perfectly synchronized. The inconsistent
views may lead to the divergence of the state ma-
chines. To avoid this problem, we need an environ-
ment message I/O synchronizer.
G4: Environment Message I/O Synchro-
nizer. Let the input synchronizer, M I syn, be a
real time machine as defined by G3. Messages from
the environment are sent to the input synchronizer.
Messages arriving at M I syn during ↑ (CI syn =
j) ≤ t <↑ (CI syn = j + 1) are buffered. At
↑ (CI syn = j + 1), the input synchronizer reads
buffered messages and forwards them to their des-
tinations. When machines need to send messages to
the external environment, they send them to the
output synchronizer. Similarly, the output synchro-
nizer, MO syn, reads messages at the rising edge of
its clock tick and forwards them to the environment.
The output synchronizer allows an external observer
to have a synchronous view of the distributed states
of a global computation.
We note that a networked computer is typically
shared by state machines for both global compu-
tation and local computation. Local computation
state machines can perform their local I/O indepen-
dently of the PALS protocol. For example, a local
servo controller reads the states of the local physi-
cal device, compares them to the setpoint provided
by the supervisory controller, computes the control
commands, and sends them to the device at a rate
that is typically higher than the PALS clock rate
used for supervisory control.
G5: The Period of Perfectly Synchronized
Clocks. Giving a set of perfectly synchronized
clocks used to drive global computation, the clock
period T should satisfy the constraint T > αmax +
qmax + µmax.
We now state the properties of a distributed real
time system driven by perfectly synchronized clocks
modeled by G1, G2, G3, G4 and G5.
Fact 1. Under a set of perfectly synchronized
clocks, Ci, 1 ≤ i ≤ N , defined by G5, when a ma-
chine Ms sends a message during period Cs = j,
this message will reach all the N receiving machines,
Mr, 1 ≤ r ≤ N , before their next clock ticks at
↑ (Cr = j + 1), 1 ≤ r ≤ N . That is, a message sent
during sender’s local clock’s jth clock period will be
received when receiving machines are still in their
jth period.
3
Figure 1: A System Using Perfectly Synchronized
Clocks
Proof. As illustrated by Figure 1, by G5 any pair
of sender Ms and receiver Mr, the distance between
sender’s rising clock edge at period j and any re-
ceiver’s rising clock edge at next period j + 1 is T .
That is, ↑ (Cr = j + 1)− ↑ (Cs = j) = T . Since
T > αmax + qmax + µmax, Fact 1 follows.
Fact 2. Under a set of perfectly synchronized
clocks, Ci, 1 ≤ i ≤ N , with their period T de-
fined by G5, any message from external environ-
ment received by input synchronizer during jth pe-
riod, will reach each of the N receiver machines,
Mr, 1 ≤ r ≤ N , during (j + 1)th period.
Proof. By G4, any message from environment
must be sent to the input synchronizer. The mes-
sages arriving at time ↑ (CI syn = j) ≤ t <↑
(CI syn = j + 1) will be buffered at the synchro-
nizer M I syn. By G4, these messages will be for-
warded at t =↑ (CI syn = j + 1). By Fact 1, the
message will reach all the N receiver machines by
↑ (Cr = j + 2), 1 ≤ r ≤ N . Fact 2 follows.
We now examine a networked real time systems
where global computations are driven by clocks with
bounded skews.
PALS Clocks. All the local clocks used by the
PALS protocol for global computation are synchro-
nized with the global clock with skews of at most
.
Figure 2: Logical Equivalence to Causality Viola-
tion
We now examine the effect of clock skews. In Fig-
ure 2, M1 is a replication of M2. First, M1 sends a
message to M2 at time t =↑ (C1 = j) + α + q. At
global time t =↑ (C1 = j), M2’s local time is at
↑ (C2 = j)− 2. Suppose that the end to end delay
from M1 to M2 is very short. That is, α+q+µ < 2.
Under this condition, the M1’s message transmit-
ted during clock period j may reach M2 when M2
is still at clock period j − 1. If the clocks were per-
fectly synchronized, this could only happen through
a violation of causality. Since M1 is a replication of
M2, this simulates sending a message to one’s own
past. Finally, note that M2 also sends a message to
M1 at its jth period at t =↑ (C2 = j) + α+ q. This
message is received by M1 at its jth period. Incon-
sistent views between replicated machines may lead
to state divergence.
Figure 3: Clock C1 leads Clock C2
To make sure that M1’s message sent during
its jth period will arrive at M2 no earlier than
↑ (C2 = j), we introduce a minimal network queue-
ing delay threshold H. As we can see in Figure 3,
if M1 sends its message at or after ↑ (C1 = j) +H,
where H = 2 − µmin, then the message will ar-
rive at M2 no earlier than ↑ (C2 = j). So the first
rule of the PALS protocol, called the PALS Causal-
ity Rule, is to require that machine Mi at its local
PALS clock period j can only transmit a message no
earlier than ↑ (C1 = j) +H. Since the time lag be-
tween any pair of machines is less than or equal to
2, under the PALS Causality Rule when a mes-
sage sent from a machine Ms during its jth period
cannot reach a receiving machine Mr earlier than
↑ (Cr = j).
G6: PALS Causality Rule. A machine Mi at
(PALS) clock period j cannot send a message ear-
lier than ↑ (Ci = j) +H, where H = 2− µmin.
We now examine the case where machine M2
leads machine M1 by 2, as illustrated in Figure
4.
Since clock C1 now lags clock C2, we need to
ensure that a message sent by M1 at jth period
4
Figure 4: Clock C2 leads Clock C1
will reach M2 before ↑ (C2 = j + 1). As illus-
trated in Figure 4, the latest instant at which
M1 can transmit its messages on the network is
max((↑ (C1 = j)+H), (↑ (C1 = j)+αmax+qmax)).
The maximal network transmission delay is µmax.
Hence, it is necessary that the clock period T >
2+max(αmax + qmax, H) + µmax. We now define
the PALS clock period.
G7: PALS Clock Period. PALS clock period
T > 2+max(αmax + qmax, H) + µmax
The definfions so far now allow us to define a
PALS system.
PALS Definition. A PALS system consists of
state machines, environment input synchronizer, en-
vironment output synchronizer, and PALS clocks
defined by rules G1, G2, G3, G4, G6 and G7.
Fact 3. Under PALS rules, a message sent dur-
ing sender’s jth clock period will be received by ma-
chines when they are still in their jth clock period.
Proof. Suppose that Fact 3 is false. There are
two possible cases.
Case 1. Assume that there exists a pair of ma-
chines, where M1’s message sent during its clock
C1’s jth period reaches M2, during M2’s clock C2’s
(j − 1)th period.
Proof of Case 1. As illustrated in Figure 3, in
order for machine M2 to receive M1’s period j mes-
sage at M2’s clock period j − 1, M2’s clock must
lag M1’s clock and the maximal lag is 2. Due to
the PALS Causality Rule (G6), the earliest possi-
ble message arrival time at M2, tarr, is
tarr = ↑ (C1 = j) +H + µmin
= ↑ (C1 = j) + (2− µmin) + µmin
= ↑ (C1 = j) + 2 (1)
However, M2 lags M1 at most 2. It follows that
(↑ (C2 = j)− ↑ (C1 = j)) ≤ 2 (2)
Substitute 1 into 2, we have
↑ (C2 = j)− ↑ (C1 = j) ≤ tarr− ↑ (C1 = j)
Hence, ↑ (C2 = j) ≤ tarr. This contradicts the
Case 1’s assumption.
Case 2. Assume that there exists a pair of ma-
chines, where a message from M1 sent during its
clock C1’s jth period reaches M2 during its clock
C2’s (j + 1)th period.
Proof of Case 2. As illustrated in Figure 4, to
maximize the chance of machine M2 receiving M1’s
jth message at M2’s clock (j+1)th period, C1 should
lag C2 by the maximum 2.
The latest message arrival time at machine M2,
tarr is:
tarr = ↑ (C1 = j)
+max(αmax + qmax, H) + µmax (3)
The starting time of machine M2’s clock period
j + 1 is
↑ (C2 = j + 1) = ↑ (C2 = j) + T
= (↑ (C1 = j)− 2) + T
Since T > max(αmax + qmax, H) + µmax + 2
↑ (C2 = j + 1)
> (↑ (C1 = j)− 2)
+ (2+max(αmax + qmax, H) + µmax)
= ((↑ (C1 = j)
+max(αmax + qmax, H) + µmax) (4)
Subtract Equation 3 from 4, we have ↑ (C2 =
j+1)− tarr > 0 That is, the message arrives at ma-
chine M2 before ↑ (C2 = j + 1). This contradicts
the assumption of Case 2. By the proofs of Case 1
and Case 2, Fact 3 follows.
Fact 4. Under the PALS protocol, messages from
the environment buffered by the Input Synchronizer
during clock period j will reach all N destination
machines at time t, ↑ (Cr = j + 1) ≤ t <↑ (Cr =
j + 2), 1 ≤ r ≤ N .
Proof. Similar to the proof of Fact 3.
3. Proof of PALS Equivalence
In this section, we show that global computations
under PALS protocol and under perfectly synchro-
nized clocks are equivalent. Because we need to com-
pare machines’ behaviors under perfectly synchro-
nized clocks and under PALS protocol, we add “g”
5
to the superscripts of variables representing state
machines under perfectly synchronized clocks. we
add “p” to the superscripts of variables representing
state machines under perfectly synchronized clocks.
Let Mi, 1 ≤ i ≤ N be a group of distributed state
machines engaged in global computation. When ma-
chine Mi is driven by perfectly synchronized clock
Ci, we denote the machine as M
g
i . When machine
Mi is driven by PALS clock Ci with bounded skew
, we denote the machine as Mpi . Machine M
p
i fol-
lows the PALS PALS Causality rule.
Let Mgi (j) be the state of machine Mi under a
perfectly synchronized clock at periods Ci = j, j =
0, 1, 2, ..., L. Let Mpi (j) be the state of machine
Mi under a PALS clock at periods Ci = j, j =
0, 1, 2, ..., L. State 0 and state L are the initial state
and the last state before termination respectively.
Let Igi (j) be the set of messages received by
Mgi from machines other than the input synchro-
nizer during state Mgi (j), j = 0, ..., L. Let I
p
i (j)
be the set of messages received by Mpi from ma-
chines other than the input synchronizer during
state Mpi (j), j = 0, ..., L. In addition, let I
g syn
i (j)
be the messages received by Mgi from the input
synchronizer during state Mgi (j), j = 0, ..., L. Let
Ip syni (j) be the messages received by M
p
i from the
input synchronizer during state Mgi (j), j = 0, ..., L.
Let Ogi (j) be the set of messages sent by M
g
i to
machines other than the output synchronizer dur-
ing state Mgi (j), j = 0, ..., L. Let O
p
i (j) be the set
of messages sent by Mpi to machines other than
the output synchronizer during state Mpi (j), j =
0, ..., L. In addition, let Og syni (j) be the messages
sent by Mgi to the output synchronizer during state
Mgi (j), j = 0, ..., L. Let O
p syn
i (j) be the messages
sent by Mpi to the output synchronizer during state
Mgi (j), j = 0, ..., L.
Assumption 1. At each clock period, identi-
cal messages are received by the input synchronizer
for perfectly synchronized clocks and by the input
synchronizer for PALS clocks. That is, Ig syn(j) =
Ip syn(j), j = 0, 1, ..., L.
Note that the initial messages from the environ-
ment are buffered by the input synchronizer and will
be delivered at ↑ (Cg syn = 1) and ↑ (Cp syn = 1)
respectively.
Assumption 2. Machine Mi’s initial states un-
der a PALS clock and a perfectly synchronized clock
are identical, that is, Mgi (0) = M
p
i (0), 1 ≤ i ≤ N .
In addition, initial messages pre-deposited in the in-
put buffers are the same. That is, Igi (0) = I
p
i (0), 1 ≤
i ≤ N .
Assumption 3. Each machine, Mi, 1 ≤ i ≤ N ,
is deterministic and time invariant.
Assumption 4. The perfectly synchronized
clocks Cgi and the PALS clocks C
p
i has the same pe-
riod T as defined by G7.
Theorem 1. Under Assumptions 1, 2, 3 and 4, a
system under the PALS protocol is logically equiva-
lent to the same system driven by perfectly synchro-
nized clocks. That is, for each state j = 0, 1, ..., L,
we have identical states Mgi (j) = M
p
i (j); identical
inputs from state machines Igi (j) = I
p
i (j); identical
outputs to state machines Ogi (j) = O
p
i (j); and iden-
tical outputs to output synchronizers Og syni (j) =
Op syni (j).
Proof.
By Assumption 2, machine Mi under the PALS
clock Cpi and under a perfectly synchronized clock
Cgi has identical initial states and identical initial in-
puts. That is, identical states Mgi (0) = M
p
i (0), 1 ≤
i ≤ N ; and identical initial (null) inputs, i.e.,
Igi (0) = I
p
i (0), 1 ≤ i ≤ N . By Assumption 3, ma-
chine Mi under a PALS clock and under a per-
fectly synchronized clock makes identical state tran-
sitions and generates the identical outputs to state
machines and to the output synchronizers. That
is, Mgi (1) = M
p
i (1), 1 ≤ i ≤ N , and Ogi (0) =
Opi (0), O
g syn
i (0) = O
p syn
i (0), 1 ≤ i ≤ N . Hence,
at the initial state, machine Mi under PALS clock
and under a perfectly synchronized clock are logi-
cally equivalent.
Assume that Mgi and M
p
i are logically equiv-
alent at state k, where k > 0. By this assump-
tion, machine Mi under PALS clock C
p
i and un-
der a perfectly synchronized clock Cgi has identi-
cal states Mgi (k) = M
p
i (k), 1 ≤ i ≤ N , and has
identical inputs, Igi (k) = I
p
i (k), 1 ≤ i ≤ N , from
the other state machines. In addition, by Assump-
tion 1, the environmental input messages to Mi un-
der PALS clocks and under a perfectly synchro-
nized clocks are the same at state k − 1. That is,
Igsyn(k − 1) = Ipsyn(k − 1). By Fact 2 and Fact 4,
the same environmental input messages will be in
the buffers of Mgi (k) = M
p
i (k), 1 ≤ i ≤ N , dur-
ing their kth state.
Since machine Mi under a PALS clock and un-
der a perfectly synchronized clock at state k has
identical states, identical environmental inputs, and
identical inputs from state machines, by Assump-
tion 3 machine Mi under a PALS clock and under
a perfectly synchronized clock makes the identical
6
state transitions and generates the identical outputs
to state machines and to the output synchronizer.
That is, Mgi (k + 1) = M
p
i (k + 1), 1 ≤ i ≤ N and
Ogi (k) = O
p
i (k), O
g syn
i (k) = O
p syn
i (k), 1 ≤ i ≤ N .
By Fact 1 and Fact 3, these outputs will arrive at
their receiving machines before next state. In addi-
tion, by Assumption 1, Fact 2 and Fact 4, the envi-
ronmental inputs to Mgi (k+ 1) and M
p
i (k+ 1), 1 ≤
i ≤ N are the same.
Since machine Mi under a PALS clock and un-
der a perfectly synchronized clock at state k+1 has
identical states, identical environmental inputs, and
identical inputs from state machines, by Assump-
tion 3 machine Mi under a PALS clock and un-
der a perfectly synchronized clock makes the iden-
tical state transitions and generates the identical
outputs. That is, Mgi (k + 2) = M
p
i (k + 2), 1 ≤ i ≤
N , and Ogi (k + 1) = O
p
i (k + 1), O
g syn
i (k + 1) =
Op syni (k + 1), 1 ≤ i ≤ N . It follows that Mi, 1 ≤
i ≤ N , under a PALS clock Cpi and under a per-
fectly synchronized clock Cgi are logically equiva-
lent at state k+ 1. By induction Theorem 1 follows.
Corollary 1.1. The bound on PALS clock pe-
riod, (2+max(αmax + qmax, H) + µmax), is tight.
Proof. As illustrated in Figure 3, since the clock
skew is bounded by , a sender M1 can lead the re-
ceiver M2 by at most 2. If a sender’s output hold
time is less than H = 2 − µmin as defined by
the PALS protocol, a message from sender at time
↑ (Cp1 = j) can reach the receiver before ↑ (Cp2 = j).
Hence, H cannot be shortened.
As illustrated in Figure 4, sender M1 may lag
the receiver M2 by 2. In this case, the latest in-
stance at which the sender M1 transmits a message
at the network is ↑ (Cp1 = j)+max(αmax+qmax, H).
The longest network transmission delay is µmax.
In order to reach M2 before ↑ (Cp2 = j + 1), the
PALS clock period T must satisfy the inequality,
T > 2 + max(αmax + qmax, H) + µmax. Corollary
1.1 follows.
Corollary 1.2. The PALS protocol uses mini-
mal messages to achieve logical synchronization.
Proof. By the rules of the PALS protocol, PALS
does not use any synchronization message at all.
Corollary 1.2 follows.
We have shown that global computation un-
der PALS clocks and under perfectly synchronized
clocks are logically equivalent. Hence, we say that a
networked embedded system under the PALS pro-
tocol is logically synchronous. However, logical syn-
chrony is different from physical synchrony. When
the clocks have bounded skews, the I/O’s performed
by distributed nodes have relative I/O jitters. In
a system driven by perfectly synchronized clocks,
each machine can perform input and output at ex-
actly the same time. The PALS protocol should
only be used in global computations, where rela-
tive I/O timing jitters are tolerable by the appli-
cations in question. If such jitters are not accept-
able, then neither PALS nor any other form of net-
worked real time systems can be used. This is be-
cause any method M that can reduce I/O jitters
can also be used by a PALS based system, provided
that method M does not violate PALS Causality
rule (G6). Finally, if the I/O jitters occur in contin-
uous control, for example, during the adjustment of
setpoints of engine speeds and control surface an-
gles, they translate to control errors. If such errors
are modest, they can be compensated by feedback
controls.
4. Example Application
Although a study of PALS application has been
conducted [19], due to the page limitation, we will
illustrate PALS using a greatly simplified example
based on [11]. This dual redundant FGS example
has two physical sides corresponding to the left and
right sides of the aircraft. Each of the FGS is con-
nected to two redundant real time networks. Each
message will be sent on both networks. So the fail-
ure of a single network will not affect the operations
of FGS. The FGS periodically compares the current
state of the aircraft (speed, altitude and position) to
the desired setpoints and then generates pitch and
roll guidance commands. The FGS receives input
about the current state of the aircraft from differ-
ent subsystems, such as the Air Data System (ADS)
and Flight Management System (FMS). The FGS
supplies the pitch and roll setpoints to the autopi-
lot (AP) and display them on the Primary Flight
Display (PFD). The pilot interacts with the FGS
through the Flight Control Panel (FCP) to perform
discrete control, e.g., changing the modes. The FGS
can operate in different modes: dependent and inde-
pendent mode. In the dependent mode (hot standby
mode) only one FGS remains primary (pilot flying
side), while the other operates as a hot standby. In
the independent mode, both can independently op-
erate as the navigational source.
In the hot standby mode, two sides synchronize
their computations with each others. To illustrate
the application of PALS, we will focus on the dis-
7
crete global computation that performs the synchro-
nization logic between these systems. During the op-
eration of FGS, one side may fail and then restarted
if it is recoverable. If the primary fails, then the
standby will automatically become the primary and
the former primary that has failed will restart as the
standby. On the other hand, a pilot may issue com-
mand to switch the primary and standby. However,
the pilot’s “switch primary” command will be ig-
nored if it would let the failed side as the primary.
Note that in the hot standby configuration, when
the primary fails, it will take one step for standby to
detect the failure and to become the primary in next
step. From a control perspective, this means that su-
pervisory control of setpoint adjustments could be
missing for one period. While this can be compen-
sated in control design, it is still undesirable. This
is the reason why PALS provides the lower bound
for supervisory controller periods.
We now give a concrete example to illustrate the
PALS design vs GALS design under the following
failure model.
1. Assumption 5: Fail-stop. Machines are fail-
stop and may recover.
2. Assumption 6: No Concurrent Failure.
The probability of concurrent failures of both
machines is negligible, so is the probability of
concurrent failures of both real time networks.
There are two design requirements.
1. Unique Primary. There can be only one pri-
mary at any time.
2. Bounded No Primary Duration. For each
machine failure, there can be at most 1 period
during which there is no primary.
Figure 5: Dual FGS system
Figure 5 shows our simplified model of the dual
FGS system. This model consists of three compo-
nents: left FGS, right FGS and an environment
event Input Synchronizer. In this example, the only
environment event to these two FGS is the Switch-
Primary. The pilot can send a SwitchPrimary = 1
command to change the primary FGS. In the ab-
sence of any pilot command, the Input Synchronizer
sends a value 0 (zero). We assume that the FGS
can fail. We inject failure using two boolean inputs:
FailLeft and FailRight. For example, if FailLeft = 1 ,
then left FGS will fail in the next state. Addition-
ally, we assume that both FGS cannot fail simulta-
neously.
In our model, both FGS exchange heartbeats for
the failure detection. If FailLeft = 1 , then at the
next state LeftHeartbeat = 0 . The heartbeat is sent
after the FGS has successfully completed its com-
putation and application I/O. We assume that a
heartbeat sent in period j will reach the other side
in period j and be read at ↑ (C = j + 1). In our de-
sign, the standby also sends heartbeat to the pri-
mary. This prevents the pilot switching the primary
to a known failed side.
If a FGS is primary, then its status (LeftIsPri-
mary) is set to 1 (one). Thus, the system must sat-
isfy the predicate that exactly one FGS is primary,
which is represented by
(LeftIsPrimary = 1 ∧ RightIsPrimary = 0 )∨
(LeftIsPrimary = 0 ∧ RightIsPrimary = 1 )
Analysis of the System with Perfectly Syn-
chronized Clocks
In this case, the distributed state machines of
these three components operate at locksteps. Before
we can prove whether the system satisfies the afore-
mentioned two requirements for this ideal system,
we must define how each FGS determines whether
it is primary or standby. We use two state vari-
ables, (LeftIsAlive, LeftIsPrimary), to represent if
left FGS is alive and whether left FGS is primary
or standby, similarly for the right FGS. We initial-
ize both sides alive, left FGS to be primary and right
FGS to be standby. That is
LeftIsAlive = RightIsAlive = 1 ;
(LeftIsPrimary = 1 ) ∧ (RightIsPrimary = 0 )
We specify the behaviors of the left FGS us-
ing a truth table given in Table 1. The right FGS
has identical behaviors, except left FGS is initial-
ized to be the primary and right is initialized as the
standby. Note that a failed FGS cannot be the pri-
mary FGS immediately after recovering from fail-
ure (as shown in Row 2 of Table 1). Otherwise, the
8
Input Current state Next state
Row FailLeft SwitchPrimary RightHeartbeat LeftIsAlive LeftIsPrimary LeftIsAlive LeftIsPrimary
1 1 D D D D 0 0
2 0 D D 0 D 1 0
3 0 D 0 1 D 1 1
4 0 0 1 1 0 1 0
5 0 0 1 1 1 1 1
6 0 1 1 1 0 1 1
7 0 1 1 1 1 1 0
Table 1: State transitions of left FGS. (“D” represents don’t care)
unique primary requirement will be violated. For ex-
ample, at state i right FGS is the primary. Left FGS
is the standby and it fails. At state i+ 1, a Switch-
Primary command arrives. In addition, left FGS re-
covers in time and becomes active during state i+1.
However, the heartbeat sent by left FGS will be read
by right FGS at the beginning of state i+ 2. Hence,
during state i+ 1 right FGS ignores SwitchPrimary
command, because heartbeat sent by left FGS dur-
ing state i was missing. If we let the left FGS to be-
come primary, then we have two primaries. If pilot
issue another SwitchPrimary command, then both
become standby.
We have modeled the design specified by Ta-
ble 1 using both NuSMV [17] and Maude [18].
We verified that this synchronous design meets
both requirements. Both models can be found
in https://agora.cs.illinois.edu/download/
attachments/9527/RTSS09_PALS.zip. In a sep-
arate paper, we will give a detailed description
on how to mechanically map a synchronous de-
sign specified in AADL onto networked system with
bounded clocks skews and automatically check if
PALS protocols are followed.
Compare to the synchronous design, the GALS
solution is more complicated. To illustrate the
added complexities, we apply our simple syn-
chronous solution in the GALS environment
directly, where clocks have bounded skews. Fig-
ure 6 shows a counter example where the design
fails.
Figure 6: Counter example for the GALS solution
In this case, the Input Synchronizer delivers two
SwitchPrimary event at consecutive periods of the
Input Synchronizer as a result of the pilot’s actions.
While the right FGS processes these two events at
separate periods, the left FGS receives both Switch-
Primary event at the same period. Forms of incon-
sistencies, which do not exist in synchronous designs
but exist in asynchronous designs, add complexity.
For example, as illustrated in Page 12 of [11], a de-
sign that works in a synchronous case would fail in
the asynchronous case, resulting in each side per-
manently stuck in the primary state.
With additional hand shaking protocols, the
problem was solved. However, “the properties them-
selves are more difficult to state, were weaker than
could be achieved in the synchronous case, and re-
quired considerable complexity to be added to the
model to ensure that even the weakened proper-
ties were true.” [11]. Another problem with the
asynchronous design is the state explosion prob-
lems due to clock asynchrony. In a case study of
an active standby system, it was founded that
model checking the asynchronous model took
over 30 hours, while model checking the PALS de-
sign took less than 30 seconds1 [19]. These problems
motivated the PALS research reported in this pa-
per.
5. Summary and Conclusion
In networked cyber physical systems, real time
global computations, such as supervisory control,
require consistent views, actions and synchronized
state transitions. In the aviation community, net-
worked real time systems are known as Globally
Asynchronous Locally Synchronous (GALS) sys-
tems, because the skews between local clocks can
only be bounded but not eliminated. When interac-
tions between nodes are driven by local clocks, the
resulting interactions become asynchronous. Dis-
crete supervisory controls, such as commands for
1 See Page 42.
9
mode changes, are very sensitive to asynchronous
interactions. An aircraft has hundreds of computers
network together, each with slightly different clock
skews, and tens to hundreds of concurrent discrete
commands in the hierarchical control of the flight
control system.
To verify control logics in presence of asyn-
chronous interactions, a model checker has to ex-
amine all possible state transitions of each machine
at each clock tick under all possible clock skew com-
binations. This creates a combinatorial explosion
of the global system state space. Furthermore, in
asynchronous system designs the properties them-
selves are more difficult to state, weaker and re-
quired considerable complexity to be added to the
model to ensure that even the weakened proper-
ties.
This paper presents a real time logical synchrony
protocol, PALS, for distributed real time global
computation. Under the PALS protocol, engineers
design and verify applications as if all the dis-
tributed state machines were driven by a perfect
global clock. The PALS protocol is optimal in the
sense that 1) the bound on the periods of real time
global computation is shortest possible, and 2) the
message overhead in achieving logical synchrony is
minimal.
References
[1] Birman, K. and Joseph, T. Exploiting virtual syn-
chrony in distributed systems. SIGOPS Oper. Syst.
Rev. 21, 5,Nov. 1987.
[2] Guo, K., Vogels, W., and van Renesse, R. Struc-
tured virtual synchrony: exploring the bounds of vir-
tual synchronous group communication. 7th Work-
shop on ACM SIGOPS European Workshop: Sys-
tems Support For Worldwide Applications, Sept.
1996.
[3] Pereira, J., Rodrigues, L., and Oliveira, R. Reduc-
ing the Cost of Group Communication with Seman-
tic View Synchrony. international Conference on De-
pendable Systems and Networks, June 2002.
[4] Delporte-Gallet, C. Fauconnier, C.D.I. Real-time
fault-tolerant atomic broadcast. Proceedings of the
18th IEEE Symposium on Reliable Distributed Sys-
tems,1999.
[5] Hooman, J. Verification of Distributed Real-Time
and Fault-Tolerant Protocols. Proceedings of the 6th
international Conference on Algebraic Methodology
and Software Technology,Dec. 1997.
[6] Peter Robinson and Ulrich Schmid. The Asyn-
chronous Bounded-Cycle Model. Proceedings of
the 10th International Symposium on Stabilization,
Safety, and Security of Distributed Systems, Novem-
ber 2008.
[7] Tarek Abdelzaher, Anees Shaikh, Farnam Jahanian,
and Kang Shin. RTCast: Lightweight Multicast for
Real-Time Process Groups. IEEE Real-Time Tech-
nology and Applications Symposium, June 1996.
[8] E. Clarke, O. Grumberg, and D. A. Peled. Model
Checking. MIT Press, 1999.
[9] Ricardo Bedin Franca, Jean-Paul Bodeveix, David
Chemouil, Mamoun Filali, Dave Thomas, and Jean-
Francois Rolland. The AADL behaviour annex ex-
periments and roadmap. UML&AADL’2007, 2007.
[10] Nicolas Halbwachs and Louis Mandel. Simulation
and verification of asynchronous systems by means
of a synchronous model. In Proceedings of the 6th
International Conference on Application of Concur-
rency to System Design, 2006.
[11] Steven P. Miller, Mike W. Whalen, Dan OBrien,
Mats P.E. Heimdahl, and Anjali Joshi. A method-
ology for the design and verification of globally asyn-
chronous/locally synchronous architectures. NASA
Contractor Report NASA/CR-2005-213912.
[12] Society of Automotive Engineers. SAE standards:
Architecture analysis & design language (AADL).
AS5506, November 2004.
[13] P. C. O¨lveczky and J. Meseguer. Abstraction and
completeness for Real-Time Maude. Electronic
Notes inTheoretical Computer Science, 176(4):5–27,
2007.
[14] P. C. O¨lveczky and J. Meseguer. Semantics and prag-
matics of Real-Time Maude. Higher-Order and Sym-
bolic Computation, 20(1-2):161–196, 2007.
[15] Gerard Tel. Introduction to Distributed Algorithms.
Cambridge University Press, 2nd edition, 2000.
[16] Software considerations in airborne systems and
equipment certification, DO-178B. RTCA Inc:
Washington, DC, December 1992.
[17] http://nusmv.irst.itc.it
[18] http://maude.cs.uiuc.edu
[19] Steven P. Miller and Darren Cofer. PALS for Asyn-
chrous Design. Technical Presentation, Rockwell
Collins Inc., July 15, 2008.
10
