Abstract. The popular slogan "write once, run anywhere" effectively renders the expressive capabilities of the Java programming framework for developing, deploying, and reusing target-independent applets. Its generality and simplicity has driven most attention of the compiler technology community to developing just-in-time and runtime compilation techniques, local and compositional optimization algorithms. When it comes to real-time Java and to the implementation of embedded software, this approach is however far from satisfactory, especially in hard real-time system design (e.g. airborne systems) where conformance to real-time specifications is critical. We show that synchronous design tools, and particularly the design workbench Polychrony, allow for a complete modeling of embedded software written in a high-level and general purpose programming language such as Java. The synchronous approach provides a formal engineering model and methodology, using global transformation and optimization techniques, that allow for a Java program written once to be mapped on any distributed target architecture. We present a technique to import a resource constrained, multithreaded, Rt Java program, together with its runtime system Api, into Polychrony. We put this modeling technique to work by considering a formal, refinement-based, design methodology that allows for a correct by construction remapping of the initial threading architecture of a given Java program on either a single-threaded target or a distributed architecture. This technique allows to generate stand-alone (Jvm-less) executables and to remap threads onto a given distributed architecture or a prescribed target real-time operating system. As a result, it allows for a complete separation between the virtual threading architecture of the functional-level system design (in Java) and its actual, real-time and resource constrained implementation.
Introduction
The well-known slogan "write once, run anywhere" effectively renders the expressive capabilities of the Java programming framework for developing, deploying, and reusing target-independent applets. Its generality and simplicity has driven most attention of the compiler technology community to developing just-in-time and runtime compilation techniques, local and compositional optimization algorithms. When it comes to real-time Java (Rt Java) and to the implementation of embedded software, this approach is however far from satisfactory, especially in hard real-time system design (e.g. airborne systems) where conformance to real-time specifications is critical.
We show that synchronous design tools, and particularly the design workbench Polychrony 1 , allow for a complete modeling of embedded software written in a high-level and general purpose programming language such as Java. The synchronous approach provides a formal engineering model and methodology, using global transformation and optimization techniques, that allow for a Java program written once to be mapped on any distributed target architecture.
We present a technique to import a resource constrained, multi-threaded, Rt Java program, together with its runtime system Api, into Polychrony. We put this modeling technique to work by considering a formal, refinementbased, design methodology that allows for a correct by construction remapping of the initial threading architecture of a given Java program on either a singlethreaded target or a distributed architecture. This technique allows to generate stand-alone (Jvm-less) executables and to remap threads onto a given distributed architecture or a prescribed target real-time operating system. As a result, it allows for a complete separation between the virtual threading architecture of the functional-level system design (in Java) and its actual, real-time and resource constrained implementation.
We illustrate this technique by considering a simple yet representative case study: an even-parity checker (Epc). We show how correctness by construction is enforced from the description of its functional architecture (in Rt Java) to its mapping onto a given target architecture. This case study demonstrates the ability of the Polychrony workbench to automatically perform otherwise difficult engineering tasks such as thread remapping and target-specific deployment, starting from a given, high-level, Rt Java design.
Overview In section 2 we present the considered functional profile of Rt Java. After a brief introduction to our modeling platform Polychrony in section 3, section 4 describes the architecture of our modeling tool. In section 5 we finally describe how to take advantage of the formal techniques of Polychrony in order to address critical issues in hard real-time system design.
A hard Rt Java profile
The Rt Java profile (i.e. the Api of the Jvm extension) considered in the present article has been implemented in the context of the Expresso project 2 and inspired by the Ravenscar-Java "high-integrity profile" (Hip) for Rt Java [12] (there are other specifications for Rt Java such as [4] ). It consists of a subset of the Rt Java specification aimed at meeting non-functional requirements of airborne systems. Table 1 details the most significant features of this profile (some contructs like memory management are not considered in this article). The Expresso packages provide extensions to thread management classes allowing for the simulation of real-time threads and event handlers. The class AsyncEvent and its sub-classes define data-structures to encapsulate hardware interrupts and events to be recycled within the real-time Jvm by appropriate handlers (class AsyncEventHandler and sub-classes) according to the pace of the real-time kernel. Processing in nominal mode within a real-time application is performed using real-time and periodic threads (classes RealTimeThread and PeriodicThread). These allow to schedule the execution of a sequential piece of code according to real-time constraints (period, deadline, duration, priority) and can be set using shared data-structures defined in the SchedulingParameters class. 
Airborne software requirements
The Hip profile for Rt Java further describes programming guidelines for the use of this profile to meet structural and functional requirements imposed by certification authorities. The syntactic simplicity of these requirements make them at the same time easy to translate into programming guidelines by users, easy to implement as syntactic checks with the help of a modeling tool, and easy to analyze using rate-monotonic analysis techniques.
-A program consists of a fixed number of threads.
-Thread and memory allocation is performed during service startup.
-No dynamic memory management during operational service.
-Threads have access to the scope of the program.
-Threads are either periodic or sporadic.
-Threads use synchronization to avoid priority inversion.
In the present study towards modeling Rt Java within the synchronous multiclocked design workbench Polychrony, we center our focus on this very subset of Java and demonstrate that its functionalities fit within our model of computation. 
An intermediate format
run ::= blk | run; run (sequence of blocks) In the Jimple grammar (Figure 1 ), a local variable is noted l, a constant c, a type t, a program label L, and class field name x. The run sequence of a thread consists of a sequence of blocks blk that consist of a label L, a sequence of operations stm and a return statement rtn. A declaration (grammar dec) may not occur in the run method of a thread or in an event handler as it might dynamically allocate memory. Hence, all declarations are assumed to be present in the main initialization class of the program, or in the init method of thread classes (executed once at system start). As a result, the Soot toolbox provides an appropriate front-end to resolve high-level object-oriented features and perform specific optimizations, allowing us to focus on the translation of the Jimple imperative notation into the multiclocked data-flow design language Signal, presented next. Jimple produces explicitly typed and initialized declarations as well as explicit locks in the presence of exceptions (monitors are released before exceptions are raised).
Polychrony for embedded system design
The Polychrony workbench implements a multi-clocked synchronous model of computation (the polychronous MoC [9] ) to model control-intensive embedded software using the Signal language and to support a formal, refinementbased, design methodology [13] with companion decision procedures to validate key design steps using precise formal design properties (e.g. controllability of a component by its environment or invariance of a design refinement under flowequivalence) and/or automatic design transformations and refinement algorithms (control hierarchization, protocol synthesis). Clocks and causality relations. In Signal, sequential code generation (to, e.g., ANSI C, JAVA, VHDL) is a design stage performed on a given model (e.g. the modulo 3 counter, figure 3 ) subsequently to an analysis of synchronization relations (e.g. signals i, s and o are synchronous, written o=s=i, figure 3 ) and scheduling relations (e.g. c and o cannot happen before s when i is present, written s → i c and s → i o, figure 3 ) are inferred. This relational information is used to construct a global and canonical control flow graph (described next). This transformation allows, for instance, to synthesize a single automaton from the model of multiple threads. Conversely, it allow to synthesize synchronization protocols to correctly map these threads on a distributed architecture. model clocks schedule code The hierarchization of the control flow graph (Figure 4 , from [3] ) is the key transformation performed by Polychrony on a given Signal specification. Given an inferred clock relation (e.g. h 3 = h 1 op h 2 , below), it allows to optimally place clocks (e.g. h 3 , below) in the control-flow tree by determining their least upper bound (e.g. h, s.t. h > h 1 and h > h 2 , below). Each clock (e.g. c for (s == 2), above) is the trigger of a set of actions that are scheduled in sequence by respecting the inferred scheduling relations (e.g. s → i o, above). 
An introduction to the Apex-Arinc Rtos model of Polychrony
The Apex 4 interface, defined in the avionics standard Arinc [2] , provides avionics application software with basic services needed to access operating system resources. Its definition relies on the Integrated Modular Avionics approach (Ima [1] ). A main feature in an Ima architecture is that several avionics applications (possibly with different levels of criticality) can be hosted on a single, shared computer system. Of course, an essential issue is to ensure safe allocation of shared computer resources in order to prevent fault propagation from one hosted application to another. This is addressed through functional partitioning of the applications w.r.t. available time and memory resources. The allocation unit that results from this decomposition is the partition (see Figure  7 in the Epc example). A partition is composed of processes which represent the executive units (an Arinc partition/process is akin to a Unix process/task). When a partition is activated, its owned processes run concurrently to perform the functions associated with the partition. The process scheduling policy is priority preemptive. Each partition is allocated to a processor for a fixed time window within a major time frame maintained by the operating system. Suitable mechanisms and devices are needed for communication and synchronization between processes (e.g. buffer, event, semaphore) and partitions (e.g. ports and channels). The Apex interface includes both services to achieve communication, synchronization, and services for the management of processes and partitions. Such services have been modeled in Polychrony [6] and its commercial implementation (Rt-builder, by Tni-Valiosys, http://www.tni-valiosys.com) is used at Hispano-Suiza and Airbus Industries. We use these services to model and implement hard Rt Java applications.
A hard Rt Java plugin for Polychrony
We introduce our method to translate and model multi-threaded real-time Java programs in Polychrony. Starting from a given, multi-threaded, Rt Java design, Polychrony allows to remap the functional thread architecture and to retarget the design onto a different target real-time operating system. Furthermore it can generate stand-alone (Jvm-less) executables. As a result, it allows for a complete separation between the virtual threading architecture of the functional-level system design and its actual, real-time implementation.
4.1 Modeling the Rt Java virtual machine using Polychrony.
Modeling a Rt Java application in Polychrony requires to distinguish between the architecture of the application and its periodic threads and event handlers. The architecture (i.e. shared data-structures and thread parameters) can be obtained by scanning the main (initialization) class of the Java program as well as the init methods of thread classes. The periodic threads and event handlers are described in the run methods of the thread classes. The init methods make use of operating system support (the Rt Java Api) that is modeled using the Apex services library of Polychrony.
The architecture of the entire framework is formed by the Apex services of Polychrony that model the Rt Java Api and correspond to the virtual machine. The structure of the actual Java program, that describes the functional threading and memory architecture, is translated to instances of Apex services and Arinc partitions. Java threads and event handlers are translated into Signal processes.
Modeling
gives the main entry label of run and pred
all predecessors of label L in the graph (Figure 7) . A real-time periodic thread (or event handler) is viewed as a sequence of critical sections that receive control from the partition-level scheduler via a clock tick, which triggers the execution of the thread, and a variable next block, which directs control to the very block to execute in the thread. The type of the next block variable is the enumeration of the block label names L, which are referenced to as #L in Signal.
A section or block blk consists of a sequence of elementary statements stm delimited by a label L and a return statement rtn. A section label defines the entry point for a given transition. Hence, it is the symbolic value of the global state variable next block of use in the current Apex partition. A block label is denoted by an event: it is present iff the corresponding block is active during the current transition. Every statement of the block (computation stm or control rtn) is conditioned by that clock. A statement stm takes three forms:
The definition l := r of a local variable l. The use of local variables in Jimple code facilitates data-flow analysis. In the case of a location, it guarantees that the reference r is read and written once within a given block or section. The reference is translated to the previous value of the corresponding signal and the local variable is translated by a local (volatile) signal. In the case of a method, the reference is associated to the local l in the environment E of the translator.
The call to an external method v = invoke i (i * ), that means a method whose byte-code is not available to Soot. All methods from available classes are inlined in the Jimple code of the thread, in order to globally optimize its control flow.
A native operation v = i i on immediate values i and j is directly translatable by the corresponding equation scheduled at the context clock C.
Lock monitoring is modeled by inlined Signal processes (e.g. entermonitor and exitmonitor). Control, via goto or if, consists in the activation of the clock that corresponds to the target block label. The handling of exceptions does not depart from this scheme: an exceptional control-flow branch is produced by Soot to handle a throw in a way similar as a goto, by associating the corresponding catch to a label. In the particular case of the return statement, translation is performed by installing the corresponding pattern of the Apex protocol at partition level (see Figure 7) . Variables v and references r are translated as is (i.[x] as i.x (a datum) or i.x (a method), parameter c stays as is, this is removed, etc.). 
A case study
To illustrate our modeling principles and outline its application to the implementation of a formal refinement-based design methodology, we study the refinement of the functional architecture of an even-parity checker (Epc) towards its distributed implementation. This case study -even if it is small compared to what could be handled by the tool -demonstrates the capability of the Polychrony workbench to automatically perform otherwise challenging engineering tasks such as thread remapping, generation of stand-alone executables, or targetspecific deployment, starting from a given, high-level, Rt Java design. The Epc consists of three functional units shown in Figure 6 : an IO interface process, a master even parity check process, and a slave ones bit-counting process. The IO process will give a start signal to the even process and will then wait for the signal done to read the result. On start, even will read the input data, pass it to the process ones and notify it with the istart signal. Ones counts the number of bits of the input data that are true and notifies even about the completion. Even finally checks if the result is an even number and notifies IO. Fig. 7 . Architecture of the Arinc partition for the Epc.
Example (thread ones). Having grasped the structure of the whole system and the architecture (Figure 7 ) of our case study, we have a closer look at the translation of one specific thread, the ones thread. The concurrency model of Rt Java starts at a design level where implicit architecture choices are already made: the system consists of a set of threads that interact via shared variables and locks. The thread ones (Figure 8, left) determines the parity of an input data. Upon receipt of the start notification, ones shifts data until it becomes 0 and the internal count is assigned to the output ocount and done notified. The thread even notifies ones to start processing data and waits until done is notified to read the final ocount and checks whether it is an even number. The Signal model of thread ones (figure 8) consists of one critical section, delimited by a pair of monitor statements. The process is activated when it obtains the lock on istart. Then, at its own rate (now conditioned by the clock c1), it determines the result. When it is finished, it sends the notification. Native operators, e.g. the unsigned operators >> and &, or separately compiled functions can be imported into the model as external functions with a behavioral type specification (the spec declaration).
Example (architecture of the Epc main). To illustrate how a Rt Java architecture description (as specified in the main method of its top-level class) is analyzed and used to generate a Signal instance of Apex services that models it, we consider the case of the Epc class parity ( Figure 9 ). The processing of the main method of the parity class starts with a linear analysis of its declarations and dec statements which produces a tree structure where each node consists of a Signal data structure that renders the category of each of the items initialized in this method (shared data-structure, periodic or sporadic thread, event handler) together with its initialization parameters (size, real-time parameters, trigger and handler). Once the main method is scanned, the translation of realtime threads and event handlers starts, in order to determine the remaining architecture parameters from the init method of each class and the number of critical sections from the run method of each class. These data are used to instantiate the generic Apex service models of Polychrony ( figure 9 ) and finalize the model of the application architecture. This yields the structure depicted in Figure 9 for the ones thread: a control process is connected to the partition-level scheduler and a computation process.
In this section, we demonstrate the use of our modeling tool to check the refinement of the even-parity checker (Epc) from its initial specification towards its distributed implementation. This is done by the insertion of communication protocols and by the merge of secondary threads on a given architecture. Our goal is to demonstrate how our polychronous design tools and methodologies allow us to consider high-level system component descriptions and to refine them in a semantic-preserving manner into a Gals implementations (globally asynchronous locally synchronous, aka. Kahn network).
Architecture design refinement. To model the physical distribution of the threads ones and even and in order to allow them to communicate asynchronously via a channel structure, we install a double handshake protocol between them. The installation of this channel incurs a desynchronization between the two processes, hence a potential change of behavior. The model of the send and recv methods in Signal is obtained from a message sequence specification (Figure 10 , left). The ready and ack flags stand for state variables declared in the lexical scope of send and recv in the module that defines the protocol; eReady and eAck stand for events. Sender and receiver use a simplified wait/notify mechanism similar to that of asynchronous event handlers. In [13] , we show that the validation of a protocol insertion such as the Epc model upgrade with a double-handshake protocol amounts to checking that the initial and upgraded designs have equivalent flows, which is amenable to model checking by proving that both design outputs always return the same sequence of values. Remapping threads onto a target. The encoding of the even-parity checker demonstrates the capability of Signal to provide multi-clocked models of Java components for verification and optimization purposes. Polychrony allows for a better decoupling of the specification of the system under design from early architecture mapping choices. For instance, the Signal compiler can be used to merge the behaviors IO and even and combine their control-flow using the clock hierarchization algorithm.
Then, checking the merge of the threads IO and even correct w. r. t. the initial specification amounts to checking that it is deterministic. As demonstrated in [9] , this is done by checking that the clock of its output can be determined from that of its input data and from the master simulation tick and that checking that the frequency of the output is lower than that of both inputs data and tick. In [5] , experiments made in the context of the Rntl-Expresso project are reported, showing that for several case-studies (multi-threaded Rt Java programs consisting of five to ten threads), the speedup obtained by automatically remapping all threads into a stand-alone (Jvm-less) and serialized executable was in average 300% compared to a commercial Java compiler implementing the Rt Java specification (TurboJ) and to the standard Java compiler (JCC).
Note that the construction of a single automaton from multiple Java threads may not be always possible: the corresponding Signal model may not necessarily exhibit a single control-flow tree after hierarchization (figure 4). However, it may still be possible to repartition the model into a smaller set of threads (at most equal to the initial one) having a hierarchizable control-flow tree.
Proving global design invariants. In addition to methodology-specific verification issues addressed in the present article, the model checker of Signal allows to prove more general properties of specification requirements: reachability, attractivity, and invariance (see [10] for details). Example applications are, for instance, to check that the refinement of a model with a finite Fifo buffer satisfies requirements such as: "one never reads an empty Fifo queue" or "one never writes to a full Fifo queue".
Such requirements have previously been studied in [7] , in the context of the modeling of Apex avionics application in Signal and the companion design methodologies. Another common requirement is non-interference: e.g. a lock is never requested from two concurrently active threads. In Signal, this problem reduces to a satisfaction problem. Suppose two requests lock ::=b 1 when c 1 and lock ::=b 2 when c 2 to a lock. Checking non-interference amounts to proving that c 1 ∧c 2 = 0 w. r. t. clock constraints. These constraints are inferred by the Signal compiler for a specific hard real-time implementation.
Related work and conclusions
The present work is based on previous results in the Polychrony project. It uses the polychronous model of computation [9] , the model of the Apex real-time operating system standard [6] , and a formal refinement-based design methodology [13] . Ptolemy [11] and Rosetta [8] are approaches that partly aim at a similar goal. They examine system implementations using different computation models. The synchronous model, which we consider, is only one of these, but it allows the treatment of more general, multi-clocked systems. We presented a new engineering technique allowing for the integrated modeling, optimization, verification, and simulation of embedded systems in a functional subset of the Rt Java specification, which is compliant with certifiable software engineering requirements in avionics. This platform-based design using the multiclocked synchronous framework Polychrony allows to perform precise and aggressive optimizations and transformations, that would be hard, yet impossible to achieve using common techniques.
