initialize state; loop wait clock-tick; read inputs; compute outputs and state; emit outputs end-loop Table 1 .1: Basic synchronous program structure adopted this new technology in replacement of analog devices (section 2), 2. and the concerns of some computer theoreticians who were trying to better account for the modelling of reactive computing systems (section 3).
Indeed, synchronous languages arose from the (unlikely) convergence of these two very different concerns and we shall explain how this convergence took place. Then we describe some of the languages at section 4.
As we shall see, these languages are both quite useful and yet with very restrictive implementation capabilities and we describe in section 5 the possible extensions that have been brought, through time, to the initial concepts.
2 From practice...
Programming practices of control and electronic engineers 2.2 The interest of the approach
Though this program structure may appear as very basic, it still has many advantages, in the restricted context it is applied to:
It requires a very simple operating system, if any. In fact, it uses a single interrupt, the real-time clock, and from the real-time requirement, this interrupt should occur only when the loop body is over, that is to say, when the computer is idle. Thus, there is no need for context switching and this kind of program can even be implemented on a "bare" machine.
Timing analysis is the simplest one. Timing analysis amounts to checking that the worst-case execution time (WCET) C of the loop body is smaller than the clock period T . In order to make this check even easier, people took care of forbidding the use of unbounded loop constructs in the loop body as well as forbidding the use of dynamic memory and recursion so as to avoid many execution errors. Despite these restrictions, the problem of assessing WCET has been constantly becoming harder because of more and more complex hardware architectures, with pipelines, caches, etc. Still, this is clearly the simplest possible timing analysis.
It perfectly matches the needs of sampled-data control and signal processing. Periodic sampling has a long lasting tradition in control and signal processing and comes equipped with well established theories like the celebrated Shannon-Nyquist sampling theorem or the sampled data control theory [1] .
All this is very simple, safe and efficient. Safety is also very important in this context as many of these control programs apply to safety-critical systems like commercial aircraft flight control, nuclear plant emergency shutdown, railway signalling and so on. These systems could not have experienced the problems met by the Mars Pathfinder due to priority inversion [2] in programs based on tasks and semaphores. As a matter of fact, operating systems are extremely difficult to debug and validate. Certification authorities in safety-critical domains are aware of it and reluctantly accept to use them. Even the designers of these systems are aware of it and issue warnings about their use, like the warning about the use of priorities in Java [3] . 1 
Limitations of the approach
Thus, this very simple execution scheme should be used as much as possible. Yet, there are situations where this scheme is not sufficient and more involved ones must be used. We can list here some of these situations:
Multi-periodic systems. When the system under control has several degrees of freedom with different dynamics, the execution periods of the corresponding controls may be different too and, then, the single loop scheme is rather inefficient. In this case an organization of the software into periodic tasks (one task for each period) scheduled by a preemptive scheduler provides a much better utilization of the processor.
In this case, checking the timing properties is somewhat harder than in the single-loop case, but there exist well established schedulability checking methods for addressing this issue [4] .
Yet a problem remains when the different tasks need to communicate and exchange data. It is there where mixing tasks and semaphores raises difficult problems. We shall see in Section 5.1, some ways of addressing this problem in a way which is consistent with the synchronous programming philosophy.
Discrete-event and mixed systems. In many control applications discrete events are tightly combined with continuous control. A popular technique consists of sampling these discrete events in the same way as continuous control.
2 When such sampling is performed, software can be structured as described previously in the periodic and multi-periodic cases.
There are still more complex cases when some very urgent tasks are triggered by some non-periodic event. In these cases more complex scheduling methods need to be used as for instance deadline monotonic scheduling [5] .
In every case but the single period one, the problem of communication remains and has to be addressed, see Section 5.1.
Distributed systems. Finally, most control systems are distributed, for several reasons, for instance, sensor and actuator location, computing power, redundancy linked with fault tolerance, etc. Clearly, most of the computing structures considered above for single computers get more complicated when one moves toward distribution. We shall present in Section 5.2.1 a solution to this problem for the case of a synchronous distributed execution platform.
3 To theory
Milner's synchronous calculus of communicating systems
In the late seventies, computer theorists were thinking of generalizing usual formalisms like the λ-calculus and language theory so as to encompass concurrency which was urgently needed in view of the ever growing interest in reactive systems. In this setting, C.A.R. Hoare proposed rendez-vous as the structuring concept for both communication and synchronization between concurrent computing activities [6] . Slightly later, R.
Milner used this concept to build a general algebraic concurrent framework called Calculus of Communicating
Systems generalizing both λ-calculus and the calculus of regular expressions of language theory [7] . This approach was very successful and yielded valuable by-products like the discovery of the bisimulation concept.
But it soon appeared that this framework was not expressive enough in that it did not allow the expression of such a current object of reactive systems as a simple watch-dog. Thus, Milner went back to work and invented the Synchronous Calculus of Communicating Systems [8] which could overcome this drawback.
These two calculi are based on a different interpretation of the parallel construct, i.e., of what takes place when two computing agents operate concurrently. This difference is shown in Figure 1 .1: while in the asynchronous version of the calculus, at each time step, one agent is chosen non deterministically to perform a step, in the synchronous case, each agent performs a step at each time. Milner also showed that SCCS can simulate CCS and thus that CCS is a sub-theory of SCCS. This, in our opinion provides some theoretical support to control practices: this means that synchronous primitives are stronger than and encompass asynchronous ones.
Synchrony as a programming paradigm
To this landscape G. Berry and the Esterel team added several interesting remarks [9] :
• First they remarked that the synchronous product is deterministic and yields less states than the • an event occurs iff it does not (there is no solution),
• an event occurs iff it occurs (there can be two solutions: either the event occurs or it does not).
This was a general problem of synchronous parallel composition, the causality problem. This problem arose in many frameworks even for those which did not recognize themselves in the synchronous programming school, i.e., each time concurrent communicating activities take place synchronously. Thus, in any of these frameworks, solutions had to be found and, indeed, many of them had been proposed. Let us list some of them:
Impose unit delay constraints on instant dialogs. This is the most conservative solution as it suppresses the problem. This solution has been adopted in Lustre [10] , SL [11] , and several execution modes of discrete-time Simulink.
3
Rely on an arbitrary ordering of actions within a time step. This solution has often been taken in simulation based frameworks. This is the case of some Simulink simulation modes and, most notably, of Stateflow. The advantage comes from the fact that no restrictions need to be imposed on instant dialogs.
The main drawback is that it provides an obscure semantic in which the behavior may depend on strange features such as, for instance, the lexicographic order of block names or the graphical position of blocks in a drawing [12] .
Order activities within a time step according to a small-step semantic. This is the approach followed by the STATEMATE semantics of Statecharts [13] . 4 The difference with the previous approach is that it does not garantee the uniqueness of solutions.
Reject non-unique solutions. This is the approach taken in Signal [14] and Argos [15] . At each instant, the different possible emission and receptions patterns are gathered and the design is rejected if there is no solution or if there are more than one solution. The drawback of this proposal is that it requires solving, at each time step, an NP-complete problem akin to solving boolean equations.
Reject non-unique constructive solutions. This is the approach taken by the Esterel team. It is based on a deep criticism of the previous solution which, according to G. Berry, does not allow designers to understand and get insight on why their design is accepted or not: a designer does not solve boolean equations in his head when designing a system [16] . Moreover, constructiveness is easier to solve and is also consistent with the rules of electric stabilization in hardware design.
As we can see, there are many possible solutions to this problem and it is not yet clear whether a satisfactory one has already emerged.
Some languages and compilers
Although they are based on the same principles, different synchronous languages propose different programming styles:
• dataflow languages are inspired by popular formalisms like block diagrams and sequential circuits.
Programs are described as networks of operating nodes connected by wires. The actual syntax can be textual, like in Lustre [10] or Signal [14, 17] , or graphical like in the industrial tool SCADE [18, 19] .
Lustre and SCADE are purely functional, and provide a simple notion of clock, while Signal is relational and provides a more sophisticated clock calculus;
• imperative languages are those where the program structure reflects the control flow. Esterel [9] is such a language, providing control structures resembling the ones of classical imperative languages (sequence, loops, etc.) but also interrupts and concurrency. In graphical languages, the control structure is described with finite automata. Argos [15] and SynchCharts [20] , both inspired by Statecharts, are such languages, where programs consist of hierarchical, concurrent Mealy machines.
This section gives a flavor of two languages: Esterel as an example of an imperative language, and Lustre as an example of a data-flow language. It also presents some classical solutions for the causality problem, and the basic scheme for code generation.
Esterel
This section presents a flavor of Esterel, a more detailed introduction can be found in [21] . We use an example adapted from that paper: a speedometer, which indefinitely computes and outputs the estimated speed of a moving object.
Signals. Inputs/outputs are defined by means of signals. At each logical time instant, a signal can be either present or absent. In the speedometer example, inputs are pure signals, that is, events occurring from time to time without carrying any value:
• sec is an event which occurs each second; it is typically provided by some real-time clock;
• cm is an event which occurs each time the considered object has moved by one centimeter; it is typically provided by some sensor.
According to the synchrony hypothesis, the program runs on a discrete clock defining a sequence of logical instants. This clock can be kept abstract: the synchronous hypothesis simply states that this base clock is "fast-enough" to capture any occurrence of the inputs sec and cm.
The output of the program is the estimated speed. It can be present or absent, but when present, a numerical information must be provided (the computed speed). It it thus a valued signal, holding (for instance) a floating point value.
The header of the program (a module in Esterel terminology) is then: Statements can be put in sequence with the semicolon operator. This sequence is logically instantaneous, which means that both parts are occurring in the same logical instant. For instance, emit X; emit Y means that X and Y are both present in the current instant, and thus, it is equivalent to emit Y; emit X. However, the sequence reflects a causality precedence as soon as imperative variables are concerned: x := 1; x := 2 results in x = 2, which is indeed different form x := 2; x := 1 (result x = 1).
The unbounded loop is the (infinite) repetition of the sequence: loop P end, where P is a statement, is equivalent to P; P; P; .... The loop statement may introduce instantaneous loops where the logical instant never ends. For instance loop emit X end infinitely emits X in the same logical instant. Such programs are rejected by the compiler: each control path in a loop must pass through a statement which "takes" logical time, typically an await statement.
Interrupts. Suppose for the time being that the event cm is more frequent than sec. Then, we can approximate the speed by the number of cm received between two occurrences of sec. As a consequence, the result will not be accurate if the speed is too low (less than one centimeter per second).
The normal behavior consists in waiting for a cm, then incrementing the counter and so on. This behavior is interrupted when a second occurs: the output speed is then emitted with the current value of the counter, and then, the whole process repeats (see Figure 1 .2).
Parallel composition. As said before, the result of speedometer is not accurate if the speed is too low.
In this case, an approximation is the inverse of the number of sec between two occurrences of cm.
In a new version of the program, two processes are running concurrently. One is based on a centimeter counter and is suitable for speeds > 1. The other is based on a sec counter and is suitable for speeds < 1. An actual null speed results in not emitting speed at all. This example illustrates a problem due to synchrony, since two concurrent processes are supposed to produce the same output (speed). In Esterel this problem is solved as follows: a signal is present if it is emitted by at least one concurrent process, and, in case of a valued signal of type τ , the several values emitted at the same logical instant are combined with an associative operator Γ : τ × τ → τ . The user has to specify this operator when declaring the signal, otherwise concurrent emission will be forbidden by the compiler.
This standard solution can be used in our example, since we know that, whenever both sec and cm are present, at most one counter is relevant (≥ 0), while the other is equal to 0. As a consequence, the values emitted on the signal speed can be safely combined with + (see Figure 1. 3). However, this solution is somewhat ad-hoc, and one may prefer to program a suitable "combinator" process, running concurrently with the other processes.
Lustre
Lustre programs as dataflow networks. A Lustre program denotes a network of operators connected by wires. Semantics. In Lustre every variable denotes a synchronous flow, that is, a function from the discrete time (N) to a declared domain of values. As a consequence, the semantics of the example above is obvious: the program takes two real flows x and y, and computes, step by step, the flow m such that:
Temporal operators. The Lustre language provides the basic data types bool, int and real and all the standard arithmetic and logic operators (and, or, not, +, *, etc.). Constants (e.g. 42, 3.14, true) are interpreted as constant functions over discrete time.
With this subset, one can write combinational programs, which are simply scalar functions extended point-wise on flows. In order to actually operate on flows, the language provides the pre operator: This operator is similar to a non-initialized memory: its value is initially undefined (represented by ⊥), and then, forever, it holds the previous value of its argument.
Some mechanism is necessary to properly initialize flows. This is achieved by the arrow operator:
By combining pre and arrow, one can define recursive flows. The speedometer in Lustre. Lustre is modular: user-defined nodes can be reused to program more and more complex systems. The syntax is functional-like, and the semantics is simple: instantiating a node is equivalent to inlining the definition of the node (provided local variables are renamed with fresh identifiers).
Note that this does not mean that inlining is necessary for code generation: under some restrictions the code generation can be modular and thus, reflect the structure of the source code (see Section4.4).
For instance, the node counter can be reused to program the speedometer. Just like in the Esterel program (Figure 1.3) , we use two counters running concurrently. The results of the counters are used to compute the fast speed (sp1) and the slow speed (sp2). By construction, at most one of these speed is relevant (which means ≥ 0), as a consequence, the output can be obtained by computing the maximum value out of sp1 and sp2.
node speedometer(sec, cm: bool) returns (speed: real); var cpt1, cpt2 : int; sp1, sp2 : real; let cpt1 = counter(cm, sec); sp1 = if sec then real(cpt1) else 0.0; cpt2 = counter(sec, cm); sp2 = if (cm and (cpt2 > 0)) then 1.0/(real(cpt2)) else 0.0; speed = max(sp1, sp2); tel 
Causality checking
Among classical static checks (type checking, references etc.), the compilation of synchronous languages requires to check causality. As explained in Section 3.3, the logical simultaneity of causes and consequences may introduce logical loops. This is the case:
• in Esterel, when the status of a signal depends on itself, e.g. present X then emit X;
• in Lustre, when the definition of a flow is instantaneously recursive, e.g. X = X or Y.
Whatever the language, the problem is in fact similar: instantaneous dependencies may contain loops that must be checked in some manner. We present here three solutions adopted in actual compilers.
Syntactic reject. This is somehow the most strict approach: it consists in statically rejecting any program containing a recursion in the instantaneous dependencies relation. This solution is adopted in Lustre (and SCADE), and the criterion is purely syntactic. For instance, the following program is rejected, even if it is impossible to have a causality loop at run-time (if condition C is true then X depends on Y, if C is false then Y depends on X):
This choice in Lustre is purely pragmatic: it is widely accepted that combinational loops in dataflow descriptions are error-prone and should be avoided, even in the case where the loop can actually be solved by some reasoning.
Boolean causality. This solution is the opposite of the previous one, in the sense that it aims at accepting any "safe" program, regardless to syntactic loops. Intuitively, this analysis accepts any program that can be proved to be both reactive and deterministic, i.e., to have exactly one possible reaction (pair of next state and outputs), for each set of inputs, no matter what the current state is.
More precisely, instantaneous dependencies are represented by a system of logical equations of the form V = F (V, I), where V are the local and output variables and I are the inputs. The system is considered as correct if, for any valuation of I, the system admits a unique solution for V . For instance, supposing that V = {x, y}, I = {c} and f , g are Boolean functions, the system:
admits a unique solution for both c = 1 (x = f (0), y = 0) and c = 0 (x = 0, y = g(0)), and thus, it is accepted.
This solution rejects programs that clearly make no sense. For instance, consider an Esterel program where the only emission of X appears in the statement present X else emit X. Instantaneous dependencies gives the equations X = ¬X, which has no logical solution; the program is not reactive and thus, it is rejected.
Consider now another example where the unique emission of X appears in "present [X or A] then emit X end". The underlying logical equation is then X = X ∨ A, which has a unique solution if A = 1, but two solutions when A = 0. The corresponding program is then rejected as non-deterministic.
Boolean causality has been adopted in Argos and in early versions of the Esterel compiler. However, it suffers from several drawbacks:
• It accepts "dubious" programs. For instance, consider the system (X = X, Y = X ∧ ¬Y ); this system seems to have all the bad characteristics: X is not constrained, while Y depends on its own negation.
However, the system admits a unique solution X = Y = 0, and thus, the corresponding program is accepted. Note that this system, if implemented as a combinational circuit in the straightforward way, yields an unstable circuit, that is, a circuit with races.
• Boolean causality requires satisfiability checking, which is NP-complete. As a consequence, the static check is likely to become intractable as the number of variables grows.
Constructive causality. This solution has been proposed [22, 23] to circumvent the problems of Boolean causality, which means: to reject the doubtful programs, and to allow a (much) more efficient static check.
The idea is to solve recursive equations of the form V = F (I, V ) according to constructive (or intuitionistic) logic rather than classical Boolean logic.
Roughly speaking, constructive logic is classical logic without the tertium non datur principle (i.e. without the axiom ∀x, x ∨ ¬x). In constructive logic, the system (X = X, Y = X ∧ ¬Y ) has no solution, but x = c ∧ f (y), y = ¬c ∧ g(x) still admits a unique solution (as a function of c):
Concretely, constructive logic behaves as a propagation of facts in the four-values lattice { , 0, 1, ⊥}, where is the erroneous value (both true and false), and ⊥ the unknown value (either true or false). If, during the evaluation, some variable becomes , the system has no solution. If the evaluation stops while a variable is still ⊥, the system has several solutions.
Solving closed systems in constructive logics is similar to partial evaluation, and thus, has a polynomial cost. The complexity still grows exponentially by adding free variables such as inputs (the system must be solved for any valuation of free variables). However, efficient symbolic algorithms exist [23] , and, in practice, constructive causality is much more efficient than Boolean causality.
We have shown that constructive causality rejects programs that are obviously wrong, or at least dubious.
But what are exactly the right programs ? and, does the method accept them all ? G. Berry [22] pointed out a natural argument in favor of constructive causality, by stating that it reflects what actually happens in a digital circuit: recursive systems that are constructively correct are those which, when mapped to hardware,
give circuits that eventually stabilize.
Code generation
The importance of this topic is due to the very peculiar position of synchronous languages, somewhere between modeling tools and programming languages. According to which definition is chosen, one would address the topic as "compilation" or as "program synthesis". The truth is somewhere in-between: code generation for synchronous languages is in general harder than usual compilation but still easier than program synthesis. It should be noted that code generation is still met with resistance: in many cases, even if people use synchronous modeling tools (e.g., Simulink/Stateflow), they still prefer to manually recode the models, mainly for efficiency reasons, related either to performance or code length or memory width. This situation reminds of the old times of assembly versus high level languages. There is little doubt that, sooner or later, code generation from high level models will become mainstream. This is why this question is addressed here.
We only consider here the basic problem, which is the generation of purely sequential code, suitable for a single-task, mono-processor implementation, as shown in Table 1 .1.
Synchronous compilers do not actually produce the full program: they identify the necessary memory (and its initial value), and produce a procedure implementing a single step of execution (the compute outputs and states in Table 1 .1). In other terms, compilers only provide the functionality of the system, and some main loop program should be written to define the actual execution rate (e.g., event-driven or periodically-sampled, depending on the application).
From data-flow to sequential code
Consider the example of the counter (Figure 1 Following those principles, the target code is a simple sequence of assignments, as shown in the left of Figure 1 .7. The main advantage of this somehow naive algorithm is that it produces a code which is neither better nor worse than the source code: both the size and the execution time are linear with respect to the size of the source code. This one-to-one correspondence between source and target code is particularly appreciated in critical domain like avionics, and it has been adopted by the SCADE compiler.
However, some optimizations can be made in order:
• to minimize the number of buffers (in the example, pre c is redundant),
• to speed-up execution by building a control structure.
An optimized code is shown in the right of Figure 1 .7.
Explicit automaton
For imperative languages like Esterel, and even more for languages based on concurrent Mealy machines, it may appear a good idea to generate a code whose structure is an explicit automaton. The principle of automaton generation is illustrated in Figure 1 the procedure is a switch on the enumerated variable implementing the current state. In this example, inputs/outputs are performed using procedures that are not detailed.
Compilation into an explicit automaton produces code which is very efficient in terms of execution time: a typical call of the next procedure only requires a few tests and assignments. The drawback is indeed the size of the code: flattening a hierarchical, parallel composition of Mealy machines may result in code exponentially larger than the source program. As a consequence, this compilation scheme was soon abandoned for methods providing a more realistic compromise between code size and execution time. 
Implicit automaton
Several techniques for generating efficient and compact code for Esterel have been proposed [24, 25, 26, 27] .
All these methods are far too sophisticated to be presented in a few lines. However, they all share the same characteristic, namely, that the state is encoded using several Boolean variables instead of a single enumerated variable. Note that an automaton with n states can be encoded using log 2 (n) Boolean variables. One can note that, while the memory required is now clearly linear with respect to the source size, the number of required operations seems to have grown dramatically. Thus, some work remains in order to actually obtain concise and efficient code, for instance:
• identify common sub-expressions, in order to compute things once,
• embed the whole computation step into a local control structure to avoid useless computation,
• or even, try to build a new encoding giving a better compromise between the number of state variables and the size of code. In practice, efficient Esterel compilers [25, 28] are able to produce code whose size is reasonable (most of the times linear, quadratic in the worst case) and whose execution time is similar to what can be obtained by "hand-made" code.
Back to practice
We have seen several examples of synchronous languages. We have also seen how a single-processor and single-task (or, equivalently, single-process or single-thread) implementation can be automatically generated from a synchronous program. Such an implementation is simple: it consists of initialization code followed by a read-compute-write loop triggered by some external event (clock "tick" or other). Despite its advantages, discussed above, this "classical" implementation method also has limitations, as discussed in Section 2.3 above: it is not well-suited for multi-rate applications (multi-periodic or mixed time-and event-triggered) and it is not application to implementations on distributed execution platforms.
Multi-rate applications and distributed implementations are common in industrial practice. In order to deal with these cases, new methods have been developed which allow to implement synchronous programs under various software or hardware architectures, for instance, involving many tasks or many processors connected with a bus. We review some of these methods in the subsections that follow. ... P 1 :
... 
Single-processor, preemptive multi-tasking implementations
In this subsection we will explain how synchronous programs can be implemented on a single processor that is equipped with a real-time operating system (RTOS) employing some type of preemptive scheduling policy, such as static-priority or earliest-deadline first (EDF). In such a case, we can generate a multi-task instead of a single-task implementation. This means that there will be many tasks, each corresponding to a part of the synchronous program. The RTOS will schedule the tasks for execution on the processor.
To justify the interest behind multi-task implementations, let us provide a simple example. Consider a synchronous program consisting of two parts, or tasks, P 1 and P 2 , that must be executed every 10 ms and every 100 ms, respectively. Suppose the worst-case execution time (WCET) of P 1 and P 2 is 2 ms and 10 ms, respectively, as shown in Figure 1 .12. Then, generating a single task P that includes the code of both P 1 and P 2 would be problematic. P would have to be executed every 10 ms, since this is required by P 1 . Inside P , P 2 would be executed only once every 10 times (e.g., using an internal counter modulo 10). Assuming that the WCET of P is the sum of the WCETs of P 1 and P 2 (note that this is not always the case), we find that the WCET of P is 12 ms, that is, greater than its period. In practice, this means that every ten times, the task P 1 will be delayed by 2 ms. This may appear harmless in this simple case, but the delays might be larger and much less predictable in more complicated cases.
Until recently, there has been no rigorous methodology for handling this problem. In the absence of such a methodology, industrial practice consists in "manually" modifying the synchronous design, for instance, by "splitting" tasks with large execution times, like task P 2 above. Clearly, this is not satisfactory as it is both tedious and error-prone.
When an RTOS is available, and the two tasks P 1 and P 2 do not communicate with each other (i.e.,
do not exchange data in the original synchronous design), there is an obvious solution: generate code for two separate tasks, and let the RTOS handle the scheduling of the two tasks. Depending on the scheduling policy used, some parameters need to be defined. For instance, if the RTOS uses a static-scheduling policy, as this is the case most of the times, then a priority must be assigned to each task prior to execution. During execution, the highest-priority task among the tasks that are ready to execute is chosen. In the case of multi-periodic tasks, as in the example above, the rate-monotonic assignment policy is known to be optimal in the sense of schedulability [4] . This policy consists in assigning the highest priority to the task with the highest rate (i.e., smallest period), the second highest priority to the task with the second highest rate, and so on.
This solution is simple and works correctly as long as the tasks do not communicate with each other.
However, this is not a common case in practice. Typically, there will be data exchange between tasks of different periods. In such a case, some inter-task communication mechanism must be used. On a single processor, this mechanism usually involves some form of buffering, that is, shared memory accessed by one or more tasks. Different buffering mechanisms exist:
• simple ones, such as a buffer for each pair of writer/reader tasks, equipped with a locking mechanism to avoid corruption of data because of simultaneous reads and writes;
• same as above but also equipped with a more sophisticated protocol, such as a priority inheritance protocol to avoid the phenomenon of priority inversion [2] , or a lock-free protocol to avoid blocking upon reads or writes [29, 30] ;
• other shared-memory schemes, like the publish-subscribe scheme used in the PATH project [31, 32] , which allows decoupling of writers and readers.
None of these buffering schemes, however, guarantees preservation of the original synchronous semantics.
5
This means that the sequence of outputs produced by some task at the implementation may not be the same as the sequence of outputs produced by the same task at the original synchronous program. Small discrepancies between semantics and implementation can sometimes be tolerated, for instance, when the task implements a robust controller which has built-in mechanisms to compensate for errors. In other applications, however, such discrepancies may result in totally wrong results, with catastrophic consequences. Having a method that guarantees equivalence of semantics and implementation is then crucial. It also implies that the effort spent in simulation or verification of the synchronous program need not be duplicated for the implementation. This is an extremely important cost factor.
To show why preservation of synchronous semantics is not generally guaranteed, consider the example shown in Figure 1 .13. The timeline on the top of the figure shows the arrivals (releases) of three tasks: The timeline on the bottom of Figure 1 .13 shows a possible behavior of a simple implementation of the three tasks. We suppose that each task is executed as a separate process on a RTOS that uses static-priority preemptive scheduling. We suppose that task τ q has higher priority than τ i and τ i has higher priority than τ j . The execution periods of the different tasks are represented by the "boxes" shown in the figure. It can be seen that task τ q , because of its higher priority, continues to execute when τ j and τ i are released. In that way, it "masks" the order of arrivals of τ i and τ j , which results in an inverse execution order: τ i executes before τ j , because it has higher priority. As a result, task τ j reads the wrong output of τ i : y i k+1 instead of
In the rest of this subsection we will present a novel implementation method which permits to systematically build multi-task implementations of synchronous programs while preserving the synchronous semantics.
This method has been developed in a number of recent works [33, 34, 35] . We only sketch the main ideas of the method here, and refer the reader to the above publications for details.
The method consists in a set of buffering schemes that mediate data between writer and reader tasks.
Each of these schemes involves a set of buffers shared by the writer and the readers, a set of pointers pointing to these buffers and a protocol to manipulate buffers and pointers. The essential idea behind all protocols is that the pointers must be manipulated not during the execution of the tasks, but upon the arrival of the events triggering these tasks. In that way, the order of arrivals can be "memorized" and the original semantics can be preserved.
Let us describe one of these protocols, called the high-to-low buffering protocol, and described in Figure 1.14. This protocol is used when the writer has higher priority than the reader (for simplicity we assume a single reader, but the protocol can be extended to any number of readers, with the addition of extra buffers, see [35] ). In this protocol, the reader τ j maintains a double buffer B[0,1]. The reader also maintains two boolean variables current, next. current points to the buffer currently being read by τ j and next points to the buffer that the writer must use when it arrives next, in case it preempts the reader. The two buffers are initialized to a default value (which the reader reads if it is released before any release of the writer) and both bits are initialized to 0.
When the reader task is released, it copies next into current, and reads from B[current] during its execution. When the writer task is released, it checks whether current is equal to next. If they are equal, then a reader might still be reading from B[current], therefore, the writer must write to the other buffer, in order not to corrupt this value. Thus, the writer toggles the next bit in this case. If current and next are not equal, then this means that one or more instances of the writer have been released before any reader was released, thus, the same buffer can be re-used. During execution, the writer writes to B[next].
Setting: a high-priority writer task τ i communicates data to a low-priority reader task τ j .
Task During execution:
• When τ i is released: if current = next, then next := not next.
• While τ i executes it writes to B[next].
• When τ j is released: current := next.
• While τ j executes it reads from B[current]. 
Distributed implementations
In this subsection we will explain how synchronous programs can be implemented on distributed architectures, consisting of a number of computers connected via some type of network. Most embedded applications today involve such distributed architectures. For instance, high-end cars today may contain up to seventy electronic control units connected by many different busses. The implementation of a synchronous program consists of one or more tasks per computer, plus extra "glue code" that handles inter-task communication.
In the rest of the section, we present two distributed implementation methods.
Implementation on the Time-Triggered Architecture
The Time Triggered Architecture [36] or TTA is a distributed, synchronous, fault-tolerant architecture. It includes a set of computers (or nodes) connected via a synchronous bus. The bus is synchronous in the sense that it implements a clock-synchronization protocol. This implies a global time frame for all nodes. Access to the bus is done using a statically defined time-triggered schedule: each node is allowed to transmit during a specified time slot. The schedule repeats periodically as shown in Figure 1 .15. The schedule ensures that no two nodes transmit at the same time (provided their clocks are synchronized, which is the responsibility of the TTA bus controller). Therefore, no on-line arbitration is required, contrary to the case of the CAN bus [37] . A set of tasks is executing at each TTA node. The latter is equipped with the OSEKtime operating system [38] which allows tasks to be also scheduled in a time-triggered manner.
Generating a TTA implementation from a synchronous program means the following:
• Decomposing the synchronous program into a set of tasks.
• Assigning each task to a TTA node where it is going to execute.
• Scheduling the tasks on each node and the messages on the TTA bus.
• Generating code for each task and glue code for the messages.
Notice that the first two steps are not particular to an implementation on TTA. Any distributed implementation of a synchronous program requires at least these two steps. The scheduling steps are specific to TTA.
Implementations on other types of platforms will probably involve similar steps, however. For instance, instead of producing a time-triggered schedule of tasks and messages, priorities might need to be assigned to tasks (e.g., when the RTOS uses static-priority preemptive scheduling) or messages (e.g., when the bus is CAN).
In theory, the first three steps above can be formulated in terms of an optimization problem that can be solved automatically. In practice, the degrees of freedom (unknowns of the optimization problem) are far too many for the problem to be tractable (the problem is NP-hard). Thus, some choices must be made by the human user of the method. Fortunately, in practice the user has often a good intuition about many of these choices. For instance, the decomposition of the program into tasks and the assignment of tasks to processors is often dictated by the application and topology constraints (e.g., a task sampling a sensor must be executed on the computer directly connected to the sensor).
In [39] , a method is proposed to generate TTA implementations from Lustre programs. In fact, the method uses a version of Lustre extended with annotations that allow the user to specify total or partial choices on the decomposition and assignment steps, as well as real-time assumptions (on worst-case execution times) and requirements (deadlines). The method uses a branch-and-bound algorithm to solve the scheduling problem. Decomposition is handled using the following strategy: start with a coarse decomposition; if this fails to produce a valid schedule then refine some of the tasks, that is, "split" them into more than one tasks.
Which tasks should be refined is determined based on feedback from the scheduler and on heuristics (e.g., task with the largest WCET, task with longest blocking time, etc.). We refer the reader to the above paper for details.
Static scheduling of synchronous data-flow graphs on multi-processors
An older method for producing distributed implementations of synchronous designs is the one proposed in [40, 41] . This method concerns the synchronous data flow model (SDF). SDF can be viewed as a subclass of synchronous languages such as Lustre in the sense that only multi-periodic designs can be described in SDF. On the other hand, SDF descriptions are more "compact" and must generally be "unfolded" before being transformed into a synchronous program. Algorithms to perform this unfolding are provided in [40] .
Before proceeding to review the implementation of SDF, let us give an example of an SDF graph and its translation into a synchronous language. The SDF graph is shown in Figure 1 .16. It consists of two nodes A and B. The arrow from A to B denotes that A produces a sequence of tokens that are consumed by B.
The number 2 in the source of the arrow denotes that A produces two tokens every time it is invoked. The number 1 at the destination of the arrow denotes that B consumes one token every time it is invoked. These numbers imply that, in a periodic execution of A and B, B must be executed twice as many times as A.
In Lustre, the above SDF graph could be described as follows: The periodic_clock(k,p) macro constructs a boolean flow corresponding to a clock of period p and initial phase k.
Let us now turn to the implementation of SDF on multi-processor architectures proposed in [40] . This consists essentially in constructing a periodic admissible parallel schedule, or PAPS. The schedule is admissible in the sense that (1) node precedences are respected, that is, a node cannot execute before all required input tokens are available, (2) execution does not block, and (3) a finite amount of buffering is required to store tokens until they are consumed. The architecture considered in [40] consists of a number of homogeneous computers, in the sense that a node may be executed on any computer and it requires the same execution time on any computer. Communication delays between nodes are considered negligible. Finally, some synchronization mechanism is assumed available, which allows all processors to resynchronize at the end of a global period in order to restart the schedule.
Let us provide an example, taken from [40] . and (c). In the SDF graph there are three nodes A, B and C, with execution times 1, 2 and 3, respectively.
The notation "d" means there is a unit-delay in the data-flow link (similarly, "2d" means there are two unit-delays). The schedules are for two processors.
The schedule shown in Figure 1 .17(b) corresponds to a single unfolding of the SDF graph. Notice that A must be executed twice so that the two tokens needed by C are available. Also note that B may start before C finishes, since there is a unit-delay in the link from C to B. Finally, note that the period of this schedule cannot be less that 4, in other words, the rate of this implementation is It should come as no surprise that constructing an optimal (i.e., rate maximizing) PAPS is a hard (combinatorial) problem. [40] proposes a heuristic algorithm, based on the so-called level-scheduling algorithm [42] .
Conclusions and perspectives
Synchronous programming has been founded on the widely adopted practice of simple control-loop programs and on the synchrony hypothesis, which permits to abstract from implementation details by assuming that the program is sufficiently fast to "keep up" with its real-time environment. This spirit is in full accordance with modern thinking about system design, sometimes called model-based design, which advocates the use of high-level models with "ideal", implementation-independent semantics, and the separation of concerns between design and implementation. New methods that allow the high-level semantics to be preserved by different types of implementations on various execution platforms are constantly being developed and gaining ground in the industry.
A number of challenges remain for the future. To list only a few of them:
• There is currently no good way to describe execution platforms. Although the complete synthesis of semantics-preserving implementations starting from a high-level model (e.g., a synchronous program) plus a formal description of the execution platform seems too far-reaching today (because of the complexity involved), less ambitious goals such as reconfiguring the code when reconfiguring the platform ultimately require a well-defined description of execution platform or its variations.
• Regarding the preservation of semantics itself, many options are open. Preservation in the "strict" sense such as the one described in Section 5 may not always be necessary. For instance, a mixed discrete/continuous controller may require strict preservation of determinism in what concerns controlflow decision points and looser, freshest-value semantics in what concerns robust, continuous control.
Finding ways to express the preservation requirements and methods to achieve them are challenging topics for future research.
• UML [43] is fastly moving toward establishing as a standard in embedded system design. Owing to the fact that synchronous programming is also a popular approach, this seems to call for a thorough comparison between these two approaches and, maybe attempts to make them converge.
