Loosely Time-Triggered Architectures (LTTAs) are a proposal for constructing distributed embedded control systems. They build on the quasi-periodic architecture, where computing units execute nearly periodically, by adding a thin layer of middleware that facilitates the implementation of synchronous applications.
INTRODUCTION
This article is about implementing programs expressed as stream equations, like those written in Lustre, Signal, or the discrete subset of Simulink, over networks of embedded controllers. Since each controller is activated on its own local clock, some middleware is needed to ensure the correct execution of the original program. One possibility is to rely on a clock synchronization protocol as in the Time-Triggered Architecture (TTA) [Kopetz 2011] . Another is to use less constraining protocols as in the Loosely TimeTriggered Architecture (LTTA) [Benveniste et al. 2002 [Benveniste et al. , 2007 Tripakis et al. 2008; .
The embedded applications that we consider involve both continuous control and discrete logic. Since the continuous layers are naturally robust to sampling artifacts, continuous components can simply communicate through shared memory without Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies show this notice on the first page or initial screen of a display along with the full citation. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, to redistribute to lists, or to use any component of this work in other works requires prior specific permission and/or a fee. Permissions may be requested from Publications Dept., ACM, Inc., 2 Penn Plaza, Suite 701, New York, NY 10121-0701 USA, fax +1 (212) 869-0481, or permissions@acm.org. c 2016 ACM 1539-9087/2016/08-ART71 $15.00 DOI: http://dx.doi.org/10. 1145/2932189 Complicated behaviors are often best described using automata whose defining equations at an instant are mode dependent. An automaton is a collection of states and transitions. Consider the following example:
Starting in state Wait, the output o is defined by the equation o = false while the condition (c = 0) is false. At the instant that this condition is true, that is, when the countdown elapses, signal s is emitted, Elapsed becomes the active state, and the output is thereafter defined by the equation o = true. 
Continuous Time
Zélus combines two models of time: discrete and continuous. Continuous time functions are introduced by the keyword hybrid. Consider a simple periodic clock that emits a signal every p seconds. Such a clock can be modeled in Zélus using a timer, a simple ODEṫ = 1, initialized to the value − p, and similarly reinitialized whenever t reaches 0.
3
The variable t is initialized as described above (init −.p) and increases with slope 1.0 (der t = 1.0). The reinitialization condition is encoded as a (rising) zero-crossing expression, which a numeric solver monitors to detect and locate significant instants. At zero crossing instants when the last t expression monitored by the up(.) operator passes through zero from a negative value to a positive one, t is reset to the value −. p and the signal s is emitted. In a continuous context, the expression last t refers to the left-limit of signal t. It is needed here to prevent circularity-a so-called causal or algebraic loop-in the definition of t.
Discrete functions can be activated on the presence of signals produced by continuous functions:
A memory o is initialized with the value false. Then, at each of the events produced by the periodic clock periodic, the new value of o is computed by the discrete function elapsed, otherwise the last computed value is maintained.
WHAT IS AN LTTA?
An LTTA is the combination of a quasi-periodic architecture with a protocol for deploying synchronous applications. We now present the key definitions of quasi-periodic architectures (Section 3.1) and synchronous applications (Section 3.3).
Quasi-Periodic Architectures
Introduced in Caspi [2000] , the quasi-synchronous approach is a set of techniques for building distributed control systems. It is a formalization of practices that Paul Caspi observed while consulting in the 1990s at Airbus, where engineers were deploying synchronous Lustre/SCADE 4 [Halbwachs et al. 1991] designs onto networks of nonsynchronized nodes communicating via shared memories with bounded transmission delays. The quasi-synchronous approach applies to systems of periodically executed (sample-driven) nodes. In contrast to the Time-Triggered Architecture [Kopetz 2011 ], it does not rely on clock synchronization. Such systems arise naturally as soon as two or more microcontrollers running periodic tasks are interconnected. They are common in aerospace, power generation, and railway systems.
Definition 3.1 (Quasi-periodic Architecture). A quasi-periodic architecture is a finite set of nodes N, where every node n ∈ N executes nearly periodically, that is, (a) each node starts at t = 0, and, (b) the actual time between any two successive activations T ∈ R may vary between known bounds during an execution:
Values are transmitted between processes with a delay τ ∈ R, bounded by τ min and τ max ,
Each is buffered at receivers until replaced by a newer one.
We assume without loss of generality that all nodes start executing at t = 0, since initial phase differences between nodes can be modeled by a succession of mute activations before the actual start of the system. A quasi-periodic system can also be characterized by its nominal period T nom and maximum jitter ε, where T min = T nom − ε and T max = T nom + ε, and similarly for the transmission delay. The margins encompass all sources of divergence between nominal and actual values, including relative clock jitter, interrupt latencies, and scheduling delays. We assume that individual processes are synchronous: Reactions triggered by a local clock execute in zero time (atomically with respect to the local environment).
In the original quasi-synchronous approach, transmission delays are only constrained to be "significantly shorter than the periods of read and write clocks" [Caspi 2000 , Section 3.2.1]. We introduce explicit bounds in Equation (2) to make the definition more precise and applicable to a wider class of systems. They can be treated naturally in our modeling approach.
Nodes communicate through shared memories that are updated atomically. Any given variable is only updated by a single node but may be read by several nodes. The values written to a variable are sent from the producer to all consumers, where they are stored in a specific (one-place) buffer. The buffer is only sampled when the process at a node is activated by the local clock. This model is sometimes termed Communication by Sampling (CbS) [Benveniste et al. 2007] .
Finally, we assume that the network guarantees message delivery and preserves message order. That is, for the latter, if message m 1 is sent before m 2 , then m 2 is never received before m 1 . This is necessarily the case when τ max < T min + τ min , otherwise this assumption only burdens implementations with the technicality of numbering messages and dropping those that arrive out of sequence.
Value Duplication and Loss. The lack of synchronization in the quasi-periodic architecture means that successive variable values may be duplicated or lost. For instance, if a consumer of a variable is activated twice between the arrivals of two successive messages from the producer, it will oversample the buffered value. On the other hand, if two messages of the producer are received between two activations of the consumer, the second value overwrites the first, which is then never read. These effects occur for any ε > 0, regardless of how small.
The timing bounds of Definition 3.1 mean, however, that the maximum numbers of consecutive oversamplings and overwritings are functions of the bounds on node periods and transmission delays. 
PROOF. Consider a pair of nodes A and B with B receiving messages from A. In the best case, a message sent by A at time t arrives in B's shared memory at t + τ min . Then, if A runs as slowly as possible, the next message is sent at t + T max and arrives in B's shared memory at worst at t + T max + τ max . The maximal delay between two successive arrivals is thus T max + τ max − τ min . At best, B is activated every T min . The maximum number of executions n of B is thus:
Each execution of B that occurs between the two arrivals samples the last received value. The maximum number of oversamplings n os = n−1 is thus given by Equation (3). The proof for the number of consecutive overwritings is similar.
This property implies that data loss can be prevented by activating a consumer more frequently than the corresponding producer, for instance, by introducing mute activations of the receiver (at the cost of higher oversampling). Quasi-periodic architectures involving such producer-consumer pairs are studied in Benveniste et al. [2002] .
Quasi-periodic architectures are a natural fit for continuous control applications where the error due to sampling artifacts can be computed and compensated for. In this article, however, we treat discrete systems, like state machines, which are generally intolerant to data duplication and loss.
Signal Combinations. There is another obstacle to implementing discrete applications on a quasi-periodic architecture: naively combining variables can give results that diverge from the reference semantics. Consider, for example, Figure 1 [Caspi 2000; Benveniste et al. 2010, Section 4.2.2] . A node C reads two Boolean inputs a and b, produced by nodes A and B, respectively, and computes the conjunction, c = a ∧ b. Here, a is false for three activations of A before becoming true, and b is true for three activations of B before becoming false. In a synchronous semantics, with simultaneous activations of A, B, and C, node C should return false at each activation. But, as Figure 1 shows, the value computed depends on when each of the nodes is activated. This phenomenon cannot be avoided by changing the frequency of node activations.
Modeling Quasi-Periodic Architectures
One of the central ideas in the original quasi-synchronous approach is to replace a model with detailed timing behavior by a discrete abstraction [Caspi 2000, Section 3.2] . Basically, a system is modeled, for example in Lustre, as a composition of discrete programs activated by a scheduler program that limits interleaving [Halbwachs and Mandel 2006] . Now, rather than arising as a consequence of the timing constraints of Definition 3.1, properties like Property 3.2 are enforced directly by the scheduler. This approach allows the application of discrete languages, simulators, and model-checkers, but it does not apply to the present setting where "short undetermined transition delays" [Caspi 2000 , Section 3.2.1] are replaced by Equation (2). In fact, Caspi knew that "if longer transmission delays are needed, modeling should be more complex" [Caspi 2000 , Section 3.2.1, footnote 2]. The earliest article on LTTAs [Benveniste et al. 2002 ] models messages in transmission but still in a discrete model. Later articles introduce a class of protocols that rely on the timing behavior of the underlying architecture. Their models mix architectural timing constraints with protocol details using automata or ad hoc extensions of timed Petri nets . In contrast, we use Zélus, a synchronous language extended with continuous time, where we can clearly separate real-time constraints from discrete control logic but still combine both in an executable language.
Let us first consider a quasi-periodic clock that triggers the activation of an LTTA node according to Equation (1). Such a clock can be simulated in Zélus using a timer, a simple ODEṫ = 1, initialized to an arbitrary value between −T min and −T max , and similarly reinitialized whenever t reaches 0. As Zélus is oriented towards simulation, we express an arbitrary delay by making a random choice.
5
This declares a discrete function named arbitrary with two inputs and defined by a single expression. Then, the model for node clocks is similar to the periodic clock of Section 2.2:
The variable t is initialized as described above and increases with slope 1.0. At zerocrossing instants, a signal c is emitted and t is reset.
Similarly, the constraint on transmission delays from Equation (2) is modeled by delaying the discrete signal corresponding to the sender's clock. A simple Zélus model is
The function delay takes a clock c as input. When c ticks, the timer is reinitialized to an arbitrary value between −τ min and −τ max corresponding to the transmission delay. Then, when the delay has elapsed, that is, when a zero-crossing is detected, a signal dc for the delayed clock is emitted. The presented model is simplified for readability. In particular, it does not allow for simultaneous ongoing transmissions, that is, it mandates τ max < T min . The full version queues ongoing transmissions, which complicates the model without providing any new insights.
Synchronous Applications
This article addresses the deployment of synchronous applications onto a quasi-periodic architecture. By synchronous application, we mean a synchronous program that has been compiled into a composition of communicating Mealy machines. The question of generating such a form from a high-level language like Lustre/SCADE, Signal, Esterel [Benveniste et al. 2003 ], or the discrete part of Simulink 6 does not concern us here.
In the synchronous model, machines are executed in lockstep. But, as our intent is to distribute each machine onto its own network node, we must show that a desynchronized execution yields the same overall input/output relation as the reference semantics. The aim is to precisely describe the activation model and the related requirements on communications, and thereby the form of, and the constraints on program distribution. The desynchronized executions we consider are still idealized-reproducing them on systems satisfying Definition 3.1 is the subject of Section 5.
A Mealy machine m is a tuple s init , I, O, F , where s init is an initial state, I is a set of input variables, O is a set of output variables, and F is a transition function mapping a state and input values to the next state and output values:
where S is the domain of state values and V is the domain of variable values. A Mealy
∞ generated by repeated firings of the transition function from the initial state:
The fact that the outputs of Mealy machines may depend instantaneously on their inputs makes both composition [Maraninchi and Rémond 2001] and distribution over a network [Caspi et al. 1994; Benveniste et al. 2000; Potop-Butucaru et al. 2004] problematic. An alternative is to only consider a Moore-style composition of Mealy machines: Outputs may be instantaneous but communications between machines must be delayed. A machine must wait one step before consuming a value sent by another machine. This choice precludes the separation of subprograms that communicate instantaneously, but it increases node independence and permits simpler protocols.
For a variable x, let 
Equation (4) states that no machine ever directly depends on the output of another. Equation (5) imposes that a variable is only defined by one machine. Finally, Equation (6) states that an input from the environment is only consumed by a single machine. Otherwise, it would require synchronization among consumers to avoid nondeterminism. Additionally, since the delayed outputs are initially undefined, the composition is only well defined when the F i do not depend on them at the initial instant. In the synchronous model, all processes run in lockstep, that is, executing one step of N executes one step of each m i . Execution order does not matter since no node 7 X ∞ = X * ∪ X ω denotes the set of possibly finite streams over elements of the set X . ever directly depends on the output of another. Thus, at each step, all inputs are consumed simultaneously to immediately produce all outputs. The Kahn semantics [Kahn 1974 ] proposes an alternative model where each machine is considered a function from a tuple of input streams to a tuple of output streams (the variables effectively become unbounded queues). Synchronization between distinct components of tuples and between the activations of elements in a composition are no longer required. The semantics of a program is defined by the sequence of values at each variable:
PROPERTY 3.3. For Mealy machines, composed as described above, the synchronous semantics and the Kahn semantics are equivalent
PROOF. We write x :: xs ∈ V ∞ to represent a stream of values, where x ∈ V is the first value of the stream, and xs ∈ V ∞ denotes the rest of the stream. Let us first prove for n-tuples of finite or infinite streams of the same length that (
We define:
By construction, streams x 1 :: xs 1 , . . . , x n :: xs n all have the same length. Hence, F • G = Id and G • F = Id. This isomorphism can be lifted naturally to functions and we obtain
O for streams of the same length. Mealy machines always consume and produce streams of the same length since the execution of a Mealy machine consumes all inputs at each step and produces all outputs. The two semantics are thus equivalent.
The overall idea is to take a synchronous application that has been arranged into a Moore-composition of Mealy machines N = m 1 || m 2 || . . . || m p , so each machine m i can be placed on a distinct network node. If the transmission and consumption of values respects the Kahn semantics, then the network correctly implements the application. Since we do not permit instantaneous dependencies between variables computed at different nodes, a variable x computed at one node may only be accessed at another node through a unit delay, that is, a delay of one logical step. In this way we need not microschedule node activations.
GENERAL FRAMEWORK
We now consider the implementation of a synchronous application S of p Mealy machines communicating through unit delays on a quasi-periodic architecture with p nodes.
This task is trivial if the underlying nodes and network are completely synchronous, that is, T min = T max ≥ τ max and with all elements initialized simultaneously. One simply compiles each machine and assigns it to a node. At each tick, all the machines compute simultaneously and send values to be buffered at consumers for use at the next tick. The synchronous semantics of an application is preserved directly.
In our setting, however, node activations are not synchronized and we must confront the artifacts described in Section 3.1: duplication, loss of data, and unintended signal combinations. We do this by introducing a layer of middleware between application and architecture. An LTTA is exactly this combination of a quasi-periodic architecture with a protocol that preserves the semantics of synchronous applications. We denote the implementation of an application S on a quasi-periodic architecture as LTTA(S). In this section we present the general framework of implementations based on a discrete synchronous model of the architecture. The details of LTTA protocols are presented in Section 5.
From Continuous to Discrete Time
We describe the protocols by adapting a classic approach to architecture modeling using synchronous languages [Halbwachs and Baghdadi 2002] . In doing so, we exploit the ability of the Zélus language to express delays without a priori discretization.
The quasi-periodic architecture is modeled by a set of clocks. Signals c1, c2, . . . denote the quasi-periodic clocks of the nodes, and dc1, dc2, dc3, . . . their delayed versions that model transmission delays (one for each communication channel). The union of all these signals is a global signal g that is emitted on each event. In Zélus, we write:
The signal g gives a base notion of logical instant or step. It allows us to model the rest of the architecture in a discrete synchronous framework.
Modeling Nodes
An LTTA node is formed by composing a Mealy machine with a controller that determines when to execute the machine and when to send outputs to other nodes. The basic idea comes from the shell wrappers of Latency Insensitive Design (LID) [Carloni et al. 2001; Carloni and Sangiovanni-Vincentelli 2002] . The schema is shown in Figure 2 .
A node is activated at each tick of its quasi-periodic clock c:
An LTTA node is modeled in Zélus as:
The controller node is instantiated with one of the controllers described in the following section. At instants determined by the protocol, the controller samples a list of inputs from incoming LTTA links i and passes them on im to trigger the machine, which produces output om (which may be a tuple). The value of om is then sent on outgoing LTTA links o when the protocol allows. The function of the controller is to preserve the semantics of the global synchronous application by choosing (a) when to execute the machine (emission of signal im) and (b) when to send the resulting outputs (emission of signal o). All the protocols ensure that before sending a new value, the previous one has been read by all consumers. Since nodes execute initially without having to wait for values from other nodes, the LTTA controllers reintroduce the unit delays required for correct distribution.
Modeling Links
Delayed communications are modeled by an unbounded FIFO queue that is triggered by the input signal and the delayed sender clock that models transmission delays dc (see Section 3.2). Messages in transmission are stored in the queue and emitted when the transmission delay elapses, that is, if clock dc ticks when the queue is not empty.
Each new message v received on signal i is added at the end of the queue q: q = enqueue(last q, v). The keyword last refers to the last defined value of a variable. Then, when a transmission delay has elapsed, that is, each time clock dc ticks when the queue is not empty (when trans is set to true), the first pending message is emitted on signal o and removed from the queue: emit o = front(last q) and q = dequeue(last q).
Finally, a link between two distinct nodes, shown in Figure 3 , stores the last received value in a memory. Since nodes are not synchronized, the output of a link must be defined at each logical step. All link nodes are thus activated at every emission of the global clock g defined in Section 4.1:
A link is modeled in Zélus as:
When a message is sent on signal i, it goes through the channel and, after the transmission delay modeled by the delayed clock dc, is stored in a memory. New messages overwrite previous memory values. The memory contents are output by the link. Note that the memory mem imposes a unit delay between the input i and the output o thus forbidding instantaneous transmission (Section 2.1). Since we assume that node computations do not depend on the initial values of delayed outputs (Section 3.3), we can initialize the memories of LTTA links with an arbitrary value mi.
Fresh Values. The LTTA controllers must detect when a fresh write is received in an attached shared memory, even when the same value is sent successively. An alternating bit protocol suffices for this task since the controllers ensure that no values are missed:
The value of the Boolean variable flag is paired with each new value received on signal i. Its value alternates between true and false at each emission of signal i. This simple protocol logic is readily incorporated into the link model. An alternating bit is associated to each new value stored in the memory. Within a controller, the freshness of an incoming value can now be detected and signaled:
Variable m stores the alternating bit associated with the last read value. It is updated at each new read signaled by an emission on r. A fresh value is detected when the current value of the alternating bit differs from the one stored in m, that is, when i.alt <> last m. The Boolean flag st states whether or not the initial value is considered as fresh.
THE LTTA PROTOCOLS
We now present the LTTA protocols. There are two historical proposals, one based on back-pressure (Section 5.1) and another based on time (Section 5.2), and two optimizations for networks using broadcast communication (Section 6). 
Back-Pressure LTTA
The Back-Pressure protocol [Tripakis et al. 2008 ] is inspired by elastic circuits [Cortadella and Kishinevsky 2007; Cortadella et al. 2006 ] where a consumer node must acknowledge each value read by writing to a back pressure link [Carloni 2006] connected to the producer. This mechanism allows us to execute a synchronous application on an asynchronous architecture while preserving the Kahn semantics. In an elastic circuit, nodes are triggered as soon as all their inputs are available. This does not work for LTTA nodes since they are triggered by local clocks, so a skipping mechanism was introduced in Tripakis et al. [2008] and included in later Petri net formalizations Baudart et al. 2014] .
For each link from a node A to a node B, we introduce a back-pressure link from B to A. This link is called a (acknowledge) at B and ra (receive acknowledge) at A. The controller, shown in Figure 4 , is readily programmed in Zélus:
The controller automaton has two states. It starts in Wait and skips at each tick until fresh values have been received on all inputs. It then triggers the machine (data(.) accesses the data field of the msg structure), stores the result in a local memory m, sends an acknowledgment to the producer, and transitions immediately to Ready. The controller skips in Ready until acknowledgments have been received from all consumers indicating that they have consumed the most recently sent outputs. It then sends the outputs from the last activation of the machine and returns to Wait.
The freshness of the inputs since the last execution of the machine is tested by a conjunction of fresh nodes (forall_fresh(i, im, true)). The controller also tests whether fresh acknowledgments have been received from all consumers since the last emission of the output signal o.
9
Remark 5.1. The composition of a Back-Pressure controller and a Mealy machine to form an LTTA node is well defined. Indeed, the dependency graph of the controller is:
Since the communication with the embedded machine adds the dependency om ← im, the composition of the two machines is free of cycles and therefore well defined. Tripakis et al. [2008] for networks of nodes communicating through buffers of arbitrary size. Another proof is given in and Baudart et al. [2014] based on the relation with elastic circuits. We give here a new straightforward proof based on the following liveness property. Initialization. Since all nodes start at t = 0 and since they can execute immediately without having received values from other nodes, we have for all nodes N, t(E N 1 ) = 0. Induction. Assume the property holds up to and including k. At worst, the last node executes and sends an acknowledgment at t = 2(τ max + T max )(k − 1). The last acknowledgment is thus received at worst τ max later, just after a tick of a receiver's clock. Therefore the receiver does not detect the message until t +τ max + T max . 10 The latest kth publication then occurs at t + τ max + T max . Symmetrically, this publication is detected at worst τ max + T max later. Hence the (k + 1)th execution occurs at t + 2(τ max + T max ), that is, at 2(τ max + T max )k.
Preservation of Semantics. This result was first proved in
Consequently, in the absence of crashes, nodes never block, which is enough to ensure the preservation of semantics.
THEOREM 5.3 (TRIPAKIS ET AL. [2008]; BENVENISTE ET AL. [2010]). Implementing a synchronous application S over a quasi-periodic architecture (Theorem 3.1) with BackPressure controllers preserves the Kahn semantics of the application:
PROOF. Back-Pressure controllers ensure that nodes always sample fresh values from the memories (guard all_inputs_fresh) and never overwrite a value that has not yet been read (guard all_acks_fresh). Since Property 5.2 ensures that nodes will always execute another step, the Kahn semantics of the application is preserved.
Performance Bounds. Property 5.2 also allows the analysis the worst-case performance of Back-Pressure LTTA nodes.
THEOREM 5.4 (TRIPAKIS ET AL. [2008]; BENVENISTE ET AL. [2010]). The worst-case throughput of a Back-Pressure LTTA node is
λ BP = 1/2(T max + τ max ). 9 Initially there are no fresh acknowledgements since controllers start in the Wait state. 10 The worst-case transmission delay on a quasi-periodic architecture is T max + τ max . PROOF. This result follows from Property 5.2. In the worst case, the delay between two successive executions of a node is 2(T max + τ max ).
Time-Based LTTA
The Time-Based LTTA protocol realizes a synchronous execution on a quasi-periodic architecture by alternating send and execute phases across all nodes. Each node maintains a local countdown whose initial value is tuned for the timing characteristics of the architecture so, when the countdown elapses, it is safe to execute the machine or publish its results.
A first version of the Time-Based LTTA protocol was introduced in Caspi [2000] . The protocol was formalized as a Mealy machine with five states in and a simplified version was modeled with Petri nets in and Baudart et al. [2014] . We propose an even simpler version that can be expressed as a two-states automaton, formalize it in Zélus, and prove its correctness.
Unlike the Back-Pressure protocol, the Time-Based protocol requires broadcast communication, and acknowledgment values are not sent when inputs are sampled.
ASSUMPTION 1 (BROADCAST COMMUNICATION). All variable updates must be visible at all nodes and each node must update at least one variable.
The controller for the Time-Based protocol is shown in Figure 5 , for parameters p and q:
The controller automaton has two states. Initially, it passes via Wait, emits the signal im with the value of the input memory i and thereby executes the machine, stores the result in the local memory m, and enters Ready. In Ready, the equation n = q −→ (last n − 1) initializes a counter n with the value q and decrements it at each subsequent tick of the clock c. At the instant when the Ready counter would become zero, that is, when the previous value last n is 1, the controller passes directly into the Wait state, resets the counter to p, and sends the previously computed outputs from the memory m to o. It may happen, however, that the local clock is much slower than those of other nodes. In this case, a fresh value from any node, exists_fresh(i, im), preempts the normal countdown and triggers the transition to Wait and the associated writing of outputs (exists_fresh is essentially a disjunction of fresh nodes). The Wait state counts down from p to give all inputs enough time to arrive before the machine is retriggered.
Basically, nodes slow down by counting to accommodate the unsynchronized activations of other nodes and message transmission delays but accelerate when they detect a message from other nodes.
Remark 5.5. The composition of a Time-Based controller and a Mealy machine to form an LTTA node is always well defined. The proof is similar to that of Remark 5.1. The dependency graph of a node is
It has no cyclic dependencies. 
Preservation of
provided that both
PROOF. The theorem follows from two properties which together imply that the kth execution of a node samples the (k − 1)th values of its producers. Since nodes communicate through unit delays, the Kahn semantics is preserved. possible ticks for the first node to publish). From Equations (7) and (8) we then have
which guarantees that the consumer executes before the reception of the new value.
Induction. Assume that the properties hold up to and including k − 1. The proofs proceed by considering the worst-case scenarios illustrated in Figure 6 .
For Property 5.7, if the kth execution of a consumer E C k occurs at time t, then its (k − 1)th sending S C k−1 must have occurred at or before t− pT min (countdown in Wait with the shortest possible ticks). This sending is detected by any node at worst T max + τ max later, which causes a producer in the Ready state to send (a producer in the Wait state has already done so), with the value arriving at the consumer at most τ max later. Equation (7) cannot occur before t − pT max − (T max + τ max ), since any send preempts the consumer in Ready at worst after a delay of T max + τ max . Since the smallest delay before the subsequent kth send of any producer arrives at the consumer is pT min + qT min + τ min (countdowns in Wait and Ready with the shortest possible ticks for the first node to publish), Equation (8) guarantees that the kth execution of the consumer occurs beforehand.
Broadcast Communication. The Time-Based protocol does not wait for acknowledgments from all receivers but rather sends a new value as soon as it detects a publication from another node. Controllers thus operate more independently, but broadcast communication is necessary. Otherwise, consider the scenario of Figure 7 obtained by adding a third node N to the scenario in Figure 6 (b) such that it communicates with node P but not node C. Now, P may be preempted in the Ready state one tick after E P k causing it to send a message that arrives at C at S P k−1 + ( p + 1)T min + τ min . Since node C would not be preempted by N but only by P, in the worst case E C k occurs ( p + 1)T max + τ max after S P k−1 . Theorem 5.8 would then require the impossible condition
Global Synchronization. In fact, Theorem 5.7 and 5.8 imply strictly more than the preservation of the Kahn semantics of an application. PROOF. Since the Time-Based protocol requires broadcast communication, each node is a producer and consumer for all others. Therefore, Property 5.7 and 5.8 impose a strict alternation between execute and send phases.
Performance Bounds. Optimal performance requires minimal values for p and q 11 :
THEOREM 5.10. The worst-case throughput of a Time-Based LTTA node is as follows:
PROOF. The slowest possible node spends p * T max in WAIT and q * T max in READY.
Note that this case only occurs if all nodes are perfectly synchronous and run as slowly as possible. Otherwise, slow nodes would be preempted by the fastest one, thus improving the overall throughput. To give a rough comparison with Theorem 5.4, remark that we have p, q ≥ 2, thus, in any case λ TB ≤ 1/4T max . A more detailed comparison can be found in Section 7.3.
OPTIMIZATIONS
Compared to the Back-Pressure protocol, the Time-Based protocol forces a global synchronization of the architecture. But running the Back-Pressure protocol under the same broadcast assumption (Assumption 1) also induces such strict alternations since every node must wait for all others to execute before sending a new value. However, when all nodes communicate by broadcast, there are simpler and more efficient alternatives. We propose here two optimizations for these particular networks.
Round-Based LTTA
The idea of the Round-Based controller is to force a node to wait for messages from all other nodes before computing and sending a new value. Nodes together perform rounds of execution. Unfortunately, at the start of a round, a value sent from a faster node may be received at a slower one and overwrite the last received value before the latter executes. A simple solution, based on the synchronous network model [Lynch 1996, Chapter 2] , is to introduce separate communication and execution phases. In this case, we could simply execute each application every two rounds. But since lockstep 11 ∀x ∈ R, x denotes the greatest integer i such that i ≤ x. execution ensures that no node can execute more than twice between two activations of any other, it is enough to communicate via buffers of size two. This ensures that messages are never overwritten even if nodes execute the application and directly send the output at every activation. Acknowledgments are no longer required. The Zélus code of the controller shown in Figure 8 is as follows:
The forall_fresh now indicates that all input buffers contain at least one value.
Compared to the Back-Pressure and Time-Based protocols, a local memory is not required to store the result of the embedded Mealy machine since machine's output is immediately sent to other nodes.
Remark 6.1. The composition of a Round-Based controller and a Mealy machine to form an LTTA node is always well defined. The proof is again similar to that of Remark 5.1. The dependency graph of a node is as follows:
It has no cyclic dependencies.
Preservation of the Semantics. For systems using broadcast communication (Assumption 1), Round-Based controllers induce a synchronous execution throughout the entire system, thus ensuring the preservation of the Kahn semantics. All nodes execute at approximately the same time.
Performance Bounds. Compared to nodes controlled by the Back-Pressure protocol, Round-Based nodes can be twice as fast since they immediately send the output of the embedded machine at each step. THEOREM 6.2. The worst-case throughput of a Round-Based LTTA node is as follows:
PROOF. Suppose that the last execution of the (k − 1)th round occurs at time t. In the worst case, a node detects this last publication and sends its new message at t + τ max + T max . The last execution of the kth round thus occurs τ max + T max after the last execution of the previous round.
Timed Round-Based LTTA
Like the Back-Pressure protocol, the Round-Based protocol uses blocking communication. If a node crashes, then the entire application stops. To avoid such problems, a classic idea is to add timeouts [Attiya et al. 1994] and to run a crash detector together with the Round-Based controller on each node. When a controller executes a step of the application, it knows which other nodes are still functioning, since it has received messages from them, and which have crashed. It can continue to compute using the values last received from crashed nodes.
At each activation, nodes broadcast a heartbeat message to signal that they are still active. Every node A maintains a counter initialized to a value p for each other node. The counter corresponding to a node B is reset to its initial value whenever a heartbeat message is received from B. The following property ensures that when the counter reaches zero, node A can conclude that B has crashed. PROPERTY 6.3 (ATTIYA ET AL. [1994] ). For all nodes A, the counter associated to another node B can only reach zero if B crashed, provided that:
PROOF. The proof involves considering the worst-case scenario illustrated in Figure 9 . Each time a node B executes, it sends a heartbeat message to A. The maximum difference between the times of two consecutive sends is T max . In the worst case, A receives the first message after the shortest possible delay τ min and the second after the longest possible delay τ max . If A runs as fast as possible, then the counter reaches zero pT min after the reception of the first message. Hence the condition τ min + pT min > τ max + T max suffices to ensure that the counter only reaches zero if node B has crashed.
The Zélus code for the timeout mechanism is as follows:
There is one additional Boolean input i_live for each node. It indicates if a heartbeat message has been received since the last activation.
A node executes a step of the application if for every other node it has either received a fresh message or detected a crash. In our model, we need only replace the implementation of fresh(i, r, st) (Section 4.3) with:
Performance Bounds. In the absence of crashes the timeout mechanism has no influence on the behavior of nodes (Property 6.3) and the Timed Round-Based protocol coincides with the Round-Based one. Otherwise the minimal value for the initial value p is:
When one or more nodes crash, active nodes wait at worst p * T max before detecting the problem and only then execute a step of the application and send the corresponding message. The delay between two successive rounds is thus bounded by p * T max + τ max . Since every node broadcasts a message at every step, the timeout mechanism has a high message complexity. An alternative is to send a heartbeat message only once every k steps and to adjust the initial value of the counters appropriately. The worstcase delay between two successive rounds increases accordingly.
CLOCK SYNCHRONIZATION
The LTTA protocols are designed to accommodate the loose timing of node activations in a quasi-periodic architecture. But modern clock synchronization protocols are costeffective and precise: the Network Time Protocol [Mills 2006] and True-Time [Corbett et al. 2012] provide millisecond accuracies across the Internet, and the Precise Time Protocol [Lee et al. 2005 ] and the Time-Triggered Protocol [Kopetz 2011, Chapter 8] provide sub-microsecond accuracies at smaller scales. With synchronized clocks, the completely synchronous scheme outlined at the start of Section 4 becomes feasible, raising the question: Is there really any need for the LTTA protocols?
To respond to this question, we recall the basics of one of the most efficient clock synchronization schemes in Section 7.1. Then we work from well-known principles [Kopetz 2011, Chapter 3] to build a globally synchronous system in Section 7.2. Finally, we compare the result with the two LTTA protocols and their round-based counterparts in Section 7.3.
Central Master Synchronization
In central master synchronization, a node's local time reference is incremented by the nominal period T nom at every activation. A distinguished node, the central master, periodically sends the value of its local time to all other nodes. When a slave node receives this message, it corrects its local time reference according to the sent value and the transmission latency. This synchronization scheme is illustrated in Figure 10 .
For the quasi-periodic architecture, and assuming the central master is directly connected to all other nodes, the maximum difference between local time references immediately after resynchronization depends on the difference between the slowest and the fastest message transmissions between the central master and slaves:
The delay between successive resynchronizations R is equal, at best, to the master's activation period. Between synchronizations, a node clock may drift from the master Fig. 10 . [Kopetz 2011, Figure 3 .10] Central Master Synchronization: A node's clock stays within the entire shaded area. R denotes the resynchronization interval, the offset after resynchronization, ρ the drift rate between two clocks, and the precision of the protocol.
clock. The maximum drift rate ρ is, in our case,
The optimal precision of clock synchronization is then the maximal accumulated divergence between two node clocks during the resynchronization interval, that is, = + 2ρ R.
The Global Clock Protocol
A global notion of time can be realized by subsampling the local clock ticks of nodes provided the period of the global clock T g is greater than the precision of the synchronization, that is, T g > . This assumption is called the reasonableness condition in Kopetz [2011, Chapter 3, Section 3.2 .1]. On any given node, the nth tick of the global clock occurs as soon as the local reference time is greater than nT g . These particular ticks of the local clocks are called macroticks. Under the reasonableness condition, the delay between nodes activations that occur at the same macrotick is less than . Activating nodes on each of their macroticks thus naturally imposes a synchronous execution of the architecture. Then, as for the round-based protocols, communication through two-place buffers suffices to ensure that messages are never incorrectly overwritten. Finally, the transmission delay may prevent a value sent at the kth macrotick from arriving before the (k + 1)th macrotick begins. From the maximum transmission delay, we can calculate the number of macroticks m that a node must wait to sample a new value with certainty:
This means that the Kahn semantics of an application is preserved if nodes execute one step every m macroticks and communicate through buffers of size two. This gives Performances. Each of the protocols entails some overhead in application execution time compared to an ideal scheme where T min = T max and τ min = τ max . To give a quantitative impression of their different performance characteristics, we instantiate in Table I the worst-case throughputs of the protocols-Theorems 5.4, 5.10, 6.2, and Equation 10-and calculate the slowdown relative to the ideal case for three different classes of architecture, from the top: slower nodes/faster communication, comparable nodes and communication, faster nodes/slower communication. In each class, we consider three different jitter values (ε) applied to both the nominal period (T nom ) and transmission delay (τ nom ). The slowdown is the relative application speed for a given architecture and protocol: 1.0 indicates the same speed as an ideal system; 2.0 means twice as slow. The Global-Clock shows the best performances when the activation period is much less than the transmission delay. In this case, the cost of clock synchronization is negligible, and lockstep execution with two-place buffers maximizes application activations. For the same reason, protocols optimized for systems using broadcast communication outperform both historical LTTA protocols and the Global Clock protocol, which still requires a little overhead for synchronization (slowdown factor between 1.1 and 1.9). Conversely, when the activation period is much greater than the transmission delay, the Time-Based protocol, which waits for the slowest nodes, has the worst performances. Also, in that case, the overhead due to clock synchronization becomes significant and protocols that do not require this synchronization perform best.
The Time-Based protocol is especially sensitive to jitter; its performance decreases rapidly as jitter increases. Rather than waiting for messages from all other nodes, the Time-Based protocol only needs the very first message received in a round and then waits long enough to be sure that all other messages have been received. It is thus more pessimistic than the Round-Based protocol, which reacts as soon as all inputs are detected.
In all cases, Round-Based protocols achieve the best worst-case throughput, especially if there is significant jitter, and the two historical protocols (BP and TB) show comparable or worse performances than those of the Global-Clock protocol. Note, though, that we consider a simplified and optimistic case; realistic distributed clock synchronization algorithms will have higher overhead. The Time-Based protocol always has the biggest worst-case slowdown, but it is the least intrusive in terms of additional control logic.
Fault Tolerance. The Back-Pressure and Round-Based protocols rely on blocking communication. If a node crashes, then the entire system stops. Therefore, fault tolerance mechanisms must be implemented in the middleware (for instance, resurrection mechanisms). On the other hand, the Time-Based, Timed Round-Based, and Global Clock protocols use timing mechanisms. If a node crashes, then active nodes continue computing using the values last sent by the crashed node. This behavior allows fault tolerance mechanisms to be implemented in the application. We only consider fail-stop crashes. Fault tolerance in the general case with omission or byzantine failures is a complex problem that requires more sophisticated protocols (with voters, self-checking, agreement protocols, clique avoidance, and node reintegration) [Kopetz and Bauer 2003] . The LTTA protocols aim only to provide a lighter alternative for less demanding systems.
CONCLUSION
In this article, we presented the Back-Pressure and Time-Based LTTA protocols and optimizations of these protocols for systems using broadcast communication in a unified synchronous framework. This gives both a precise description of the implementation of synchronous applications over quasi-periodic architectures and also permits the direct compilation of protocol controllers together with application functions. 12 We show that the Kahn semantics of synchronous applications implemented on quasi-periodic architectures is preserved by all protocols. Finally, we give bounds on the worst-case throughputs of the protocols.
The comparison with an optimistic implementation of clock synchronization shows that the LTTA protocols and their optimizations are at least competitive for jittery architectures where the transmission delay is not significant relative to node periodsexactly the class of embedded systems of interest. In addition, LTTA protocols are simple to implement: Nodes need only listen and wait and can thus be implemented as one-or two-state automata.
