In this paper we present a formal framework to verify timing properties of embedded systems. We propose a process calculus as an intermediate model to map between language-level constructs of process-based spec$cation and implementation models, and Petri net operations. We present an elegant translation scheme to generate Petri nets starting from the intermediate process expressions. The approach has been applied to verifi the freedom of deadlock in a QAM modem design, with promising results.
Introduction
As embedded systems become increasingly complex, they become increasingly prone to errors and difficult to design. From experience gained with several large scale design projects in digital communications [4, 81, we have encountered a number of subtle errors during both the specification and implementation phases, that were difficult to trace using traditional analysis methods such as simulation. Since most modern embedded systems are concurrent in nature, usually implemented using a heterogeneous architecture containing multiple hardware/software components, hard to detect errors can result from unanticipated interactions between the concurrent parts. Over the past years, a number of automated formal verification methods [5, 6, 12, 151 have emerged, that can address some of these verification problems. However, to leverage these methods, the specification model used to represent the system must be formal in the first place as ambiguity in specification semantics is in itself a major source of errors.
In this paper, we propose a formal model based on Petri nets [13] for reasoning about the behavior of a concurrent system. We have chosen the Petri net formalism, because it is well suited to model concurrency, choice, and causality, and because there is a wealth of formal verification techniques [15, 181 that can exploit the inherent partial order properties explicitly captured in the model. However, a ma-jor problem when using Petri nets as a model for verification, is the lack of a formal mapping between constructs in the (system) specification language itself and Petri net operations. We address this problem by providing such a formal mapping using a CSP [9] /CCS [ 101-like process calculus as an intermediate model. The underlying idea is that many of the existing process-based specification languages (e.g. [2] ) can be translated to what we call "intermediate code" of this process calculus, which can subsequently be translated into a Petri net representation by applying a syntax-directed mapping scheme. Once arrived at the Petri net level, existing formal verification methods can then be leveraged. Because the overall mapping is semantics preserving, these methods can be used to validate the various intermediate specifications of the design process.
The impact of this work on the design of embedded systems is twofold. At the specification phase, errors can be detected quickly without requiring extensive simulations. Secondly, when designing an embedded system, many problems occur at the hardwarehoftware interface. Therefore, a design support tool has been developed [ 191 to assist the designer in the mapping of software components on programmable processors. Such a tool assembles the hardware/software interface by selecting and combining I/O scenarios from a library. Individual scenarios are correct by construction. However, when assembling these scenarios to construct the complete interface, subtle (timing) problems may arise, as will be shown in section 5. The formal framework presented in this paper allows to diagnose these problems, such as possible deadlocks. This necessitates the modeling of the complete hardware/software interface, including the processor core, run-time operating system, hardware and software device drivers, etc.
The remainder of this paper is organized as follows. Section 2 presents the basic concepts of the used process calculus. Section 3 reviews the basic definitions of Petri nets, and details the syntax-directed translation procedure of process expressions to Petri nets. Section 4 highlights the verification framework. Section 5 extensively describes a case study. Finally, conclusions are drawn in section 6.
Process Calculus
In our model we start from a set of given atomic rendezvous actions. These actions are taken to be indivisible, and form the "leaf" processes of the model. The occurrence of an action is called an event and it is the result of two concurrent processesin some cases the environment is implicitly assumed to be the counterpart processboth engaging in the execution of the event under consideration. Every event can thus be seen as a binary synchronization. Two types of events exist: visible events and hidden events. Visible events represent external synchronizations (i.e. which are externally observable); the set of all visible events is denoted as V i s . Hidden events represent internal synchronizations and are denoted by the special symbol T . The environment cannot prevent a process from engaging in a hidden event. Indeed, as r denotes an internal synchronization and all synchronizations are binary, it cannot be synchronized upon by any other process.
Processes are built from other (leaf) processes by means of the following operators: sequential composition, deterministic and non-deterministic choice composition, parallel composition, recursion and interrupt composition.
Processes are denoted by process expressions. We also assume a set V a r of process variables denoted by x , y, etc. The set P E of all possible process expressions is defined by the following production system, where R is used as start symbol:
T o is T without free occurrences of variables In the following, the intuitive meaning of the different operators is described.
Sequential Composition of two processes P and Q , denoted P.Q, is a process that starts the execution of Q after the successful completion of P. Successful completion of a process P is defined as process P reaching a wanted state from which no actions can be performed. This operator constitutes a mixture of the prejix and sequential composition operators, as they are used in CSP [9] . Choice. The deterministic choice of two processes P and Q , denoted P + Q, is the process that can do either P or Q , where the choice is made by the environment. The non-deterministic choice of two processes P and Q , on the other hand, denoted P n Q , is the process that can either do P or Q, where the choice is made by the process itself. In both cases, the decision of which alternative to take, should be made at the beginning of the choice. Notice that the notation P fl Q is a shorthand for the process (..P) + (..&).
'In the sequel we will indistinctively use the notations "event" and 2Because we have to exclude the deadlock state.
"action" Indeed, as r denotes a hidden or internal synchronization event, the outside world cannot interfere at the beginning of the choice; (r.P) + (T.Q) then degenerates to a nondeterministic choice. This construction is borrowed from ccs [lo] .
Parallel Composition of two processes P and Q, denoted PI/&, executes the processes P and Q simultaneously and independently, except for events that are common to both processes; if both processes contain an event a, this event can only occur if both parties are ready to engage in a. With the exception that we only allow binary synchronization, this operator is also used in CSP.
Interrupt Composition of process P by a process Q, denoted as P D Q , is the process that behaves like P but which is interrupted on the occurrence of the first event of Q. When interrupted, process P arrives in a halt state, and P D Q behaves like Q. After Q has terminated successfully, P leaves its halt state and P D Q resumes P , until possibly being interrupted again. If P terminates unsuccessfully, it is still possible for P D Q to behave like Q. If P terminates successfully, on the other hand, Q cannot start execution any more. The semantics of the interrupt composition, as they are presented here, differ from the ones defined by CSP and LOTOS [ 1 11. In the latter models, an interrupt is interpreted as an abort; when a process Q "interrupts" a process P , process Q is executed, but P is never resumed. Clearly, this kind of preemption cannot be reduced to the interrupt behavior of today's processors.
Recursion is introduced by considering solutions of equations of the form z = F ( x ) , denoted as px.F(x), with F ( z ) an expression in (process) variable x , constructed solely in terms of the sequential composition, choice composition, parallel composition and interrupt composition operators. Notice that an expression beginning with a variable, can only be used in a composition if it is preceded by
Besides the interrupt operator, our process calculus has no larger expressiveness than CSP and CCS. The key idea of our approach is that we believe many of the existing process-based languages (e.g. [2]) used to specify heterogeneous systems, can be translated to "intermediate code" of this calculus which then serves as a starting point for performing a syntax-directed Petri net translation, as described in the next section. Example. Consider the following program fragment, written in C, as it can be found in a process description within the CoWareTMdata model [ 2 ] :
The send ( a , x) statement initiates a communication to send data along channel a. If the environment, at the other side of the channel, is not ready to receive data along this channel, the program is halted at this point. A s soon as the environment becomes ready, data is transfered, and both parties proceed independently. The occurrence of this communication is modeled by the atomic event a. For the receive(b,y) and send(c, (x+y)) statements the same reasoning applies. This program fragment is then modeled as X = (a.b.X) n e. Every time we encounter the while statement, the decision of (re-)entering the loop is made internally ( ! (x%y ) )we abstract from all internal data -and cannot be influenced by the environment. As a result, it is modeled as a non-deterministic choice.
Syntax-directed Translation to Petri Nets
In this section the translation of process expressions into Petri nets [ 131 is described in more detail. In section 3.1 we review some of the basic definitions and properties of Petri nets. In section 3.2 the composition operators are defined at the Petri net level. In section 3.3 we discuss the translation process itself.
Basic Definitions and Properties Definition 3.1 (Labeled Petri Net) A labeled Petri net
In the above definition P denotes a set of places, A a set of actions, F the set of transitions and mo an initial marking. For a transition t = ( p , a, q ) , a denotes the label or the action o f t , where as p and q are often referred to as the set of input and output places o f t , respectively.
Besides the structure of a Petri nets, there is also an associated dynamics. A state or marking, is the mapping of the places to the natural numbers, indicating the number of tokens in the places. Transitions between states are dictated by the following firing rule. In the sequel Mp denotes the set of all states (markings) of a Petri net with IPJ places. ( p , a , q ) The set of all reachable states is represented in a reachability graph. In such a reachability graph all vertices correspond to a valid marking of the Petri net and all arcs correspond to a transition from one marking to another due to firing of some transition in the net. The reachability graph of a Petri net N , denoted as R G ( N ) , can then be interpreted as the reflexive transitive closure of the next-state relation defined in definition 3.3.
Definition 3.2 (Enabling Rule) Let
Two important properties of Petri nets are liveness and safeness. Liveness concerns the question whether a transition can ever be fired, and is clearly opposed to deadlock. Safeness means that a place does not contain more than one token at any time.
Process Operators on Petri Nets
In this section the process of translating process expressions into Petri nets is highlighted. The approach is similar to the macro-module mapping approach [ 161 for translating a concurrent program into an asynchronous circuit. For each syntactical construct a Petri net element or operator is defined. In this way a process expression can be translated into a Petri net using a syntax-directed mapping scheme, detailed below.
In literature a lot of research has been devoted to the development of operators for the composition of Petri nets. In [ 171 finite and safe nets are constructed for the Algebra of Communicating Processes(ACP) of Bergstra and Klop [ 11 without recursion. The notion of a non-deterministic choice is absent either. In [14] safe and finite Petri nets are generated from a so-called anonymous language that contains CCS and almost the whole CSP as special "cases". The alternatives of a choice may however not contain parallelism, and every body of a recursion starts with an invisible action r. In [7] a Communication Petri Net Model is proposed.
The focus is a Petri net algebra; the concepts of successful completion (cf. sequential composition) and recursive processes are therefore missing.
In this work, the translation of a process expression into Petri nets is defined by means of translation function P T . The syntax-directed organization of P T makes it necessary to include all possible syntactic constructs of a process expression in the domain of P 7 , i.e. the set of variables Var , the atomic actions, as well as the process operators. As discussed in section 2, the sequential composition operator requires the notion of successful completion. As a result, the definition of a Petri net has to be extended to represent successful termination. Therefore we have chosen to classify certain places of the Petri net as end places. When all end places contain a token, the involved Petri net has terminated successfully. This new Petri net model is denoted as a PE-net and is defined below.
Definition 3.4 A PE-net is a tuple
In the above definition P , T , F and m0 have the same meaning as in (classical) labeled Petri nets (see definition 3.1) and E denotes the end places. The definitions and properties of (classical) labeled Petri nets can be lifted to PE-nets in a straightforward way. The initial marking of a PE-net, however, is represented by the set of initial places, i.e. the set of places that contain a token in the initial marking. This is possible due to the safe Petri net representations of the allowed process expressions. A safe Petri has in each place at most one token and its marking can therefore be represented by a set of places m, where pi E m indicates that there is a token in pi. A non-safe representation would imply that the corresponding process expression exhibits auto-concurrency, i.e. is of the form x = Pllx;. . ., which is not allowed in our syntax.
Leaf Processes. The are combined with the initial places of PT (&a), by means of a Cartesian product, effectively "abutting" the two PE-net representations together. This is shown in Figure 2 . P1 @ P3 @ In [ 141 sequential composition is defined as a special case of parallel composition; by doing so, however, there is a need for an extra concealment or hide operator. Choice Composition. For the PE-net representation of a process Q1 + &a, conflicts are introduced between all pairs of initial transitions of P T (QI) and P 7 (Qz), by means of a Cartesian product construction. The end places are combined similarly. For this to work properly, the initial places that are in cycles have to extracted by a one-step unfolding; in a choice, once the decision of what branch to take is made by the first execution of a transition, a loop iteration may then not cause the other branch to be taken. In our translation scheme, the PE-net representation of a recursive process definition already implements the desired unfolding (see Definition 3.9). Then, the above situation can only result from process expressions of the form P + ( R D &) . . ..
In these situations the one-step unfolding is accomplished through a separate root-unwinding step. This preprocessing step is only effective for those initial transitions that are in cycles. The PE-net equivalent of the choice operator is shown in Figure 3 . Besides the treatment of the end places, the above construction is similar to 17, 171. In 1171, however, the rootunwinding step introduces 2#pcyc new transitions, with Pcyc the set of initial places that are in cycles. In [7] , an extra restriction applies: all transitions that have initially marked input places must be enabled.
Parallel composition. In Petri nets a transition can be regarded as a synchronization mechanism since it can only fire if all input places contain at least one token. To model parallel composition with rendezvous synchronization, it is then sufficient to "join" the transitions that have the same action label, different from T . Since more than one transition may be labeled with the same action, all combinations have to be considered. This is shown in Figure 4 . PT (Ql) . Each new place is then connected via a selfloop with its corresponding transition. These new places are then combined with the initial places as well as the end places of P 7 ( Q 2 ) , by means of a Cartesian product construction. If 9 2 is a recursive process expression, the onestep unfolding within PT ( Q 2 ) (see Definition 3.12) then prevents the transitions of 7'7 (SI) from firing for every loop iteration within the former net. However, if Q 2 is of the form (P D 8 ) . . ., within P 7 (Q2) there can still be initial places that are in cycles (see Definition below). In this case an extra root-unwinding step is necessary. For the other cases, this preprocessing step has no effect. This construction is shown in Figure 5 . 
Definition 3.11 (Interrupt Composition) Let Q1 and Q 2 be two processes, with P 7 (QI) = (Pi, AI, F 1 , Mol, E l ) and RootUnw( P T (&a)) = (P2,A2, F 2 , MO,, E2). Let P,,, be new places such that Pl n P2 n P,,, = 8, and H a bijection H : F1 -+ P,,,. P'T (Q1 D Qz) is defined as:
Recursion. For the PE-net representation of a process p z . F ( z ) , with F ( z ) a process expression containing variable IC, we first compute P 7 ( F ( z ) ) , using the translation techniques described above. The input places of each transition with label z are then combined with the initial places, by means of a Cartesian product, effectively creating the desired loop. The initial places as well as the initial marking are kept; this construction implements a one-step unfolding making the root-unwinding step obsolete for the PE-net translation of process expressions of the forms P -t P and Q D P, with P a recursive process definition. This is shown in Figure 6 .
Besides the treatment of T events, the above construction is similar to [7] .
Interrupt Composition. For the PE-net representation of Q1 D Q 2 , we create a new place for every transition of PT(/LZ.F(.'C)) P 7 ( F ( z ) ) = (P, A, F, M O , E). P 7 (~. F ( z ) ) is dejined as:
where
Translation of Process Calculus expressions
The translation of a (complex) process expression into a Petri net can be defined recursively as follows. We start at the bottom of the syntax tree by translating the "leaf" atomic actions to PE-nets, according to definition 3.5. As we go up in the syntax tree, we gradually build up the PE-net representations of the intermediate subexpressions. 
Verification Framework
The techniques presented in this paper have been implemented in a tool, called JULIE, which consists of about 9000 lines of C code. The tool starts from the process calculus, applies a syntax-directed mapping scheme to translate between process expressions and Petri nets and performs an efficient analysis on the resulting Petri nets (e.g. checks for deadlocks, liveness properties, etc). Currently, we are developing an automated translation between heterogeneous C-VHDL specifications, as they are used in the CoWareTMdata model, and our process calculus. Because the overall mapping is semantics preserving, the analysis technique can be applied to validate the various intermediate specifications of the design process.
The analysis technique itself, called generalized partialorder analysis [ 1 XI, tackles the two primary sources ofcombinatorial explosion that may occur in conventional reachability analysis. The first source is due to concurrently enabled actions for which standard analysis requires enumerating all possible orderings. This problem can be avoided by e.g. applying existing partial-order techniqucs where only one interleaved sequence needs to be analyzed for deadlock and liveness checks [ 151. The second source is due to concurrently marked conflict places. This problem is solved by a generalized partial-order method which explores simultaneously concurrently enabled conflicting paths. The technique is based on a modified representation of markings to distinguish the different conflicting paths and can achieve an exponential reduction in algorithmic complexity.
Case study
To assess the viability of our approach we performed an extensive case study. More specifically, we experimented with the design of a Quadrature Amplitude Modulation (QAM) modem, integrating both a sender and a receiver section. The block diagram of the modem design is depicted in Figure 8 . In the sender part (bottom square) the data to be sent is first formatted into a stream of alternating I-data and Q-data, each three bits wide. Then, this data is discretely levelized by the slicer and sent to the modulation block, which multiplies the I-data and the Q-data by orthogonal carriers. The middle square of Figure 8 represents the test bench including the channel model and the user interface. The upper square represents the receiver part. A tracking module is needed to derive "a local copy" of the carrier frequency, as well as an n-taps adaptive equalizer for correcting channel induced distortion. The de-slicer and symbol extraction block perform the inverse operations of their scnder counterparts.
The QAM modem was modeled using the CoWareTMenvironment [3] . The implementation target was a heterogeneous single-chip solution; because the (de)slicer and symbol creation (extraction) modules are dependent on the characteristics of the data, these modules were chosen to be implemented in software on an ARM-7 RISC processor. The other modules were specified in VHDL to be implemented in application-specific hardware. Simulations and formal verification using the techniques presented in the paper, revealed that the heterogeneous C-VHDL specification was indeed deadlock free.
To implement the communication between the hardware and the software, the ARM-7 processor boundary has to be crossed. This is shown in Figure 9 for the communication between the tracking module of the demodulator and the deslicer. To realize the software drivers and the hardware in- Figure 9 (b), the Symphony toolbox3 has a number of U 0 scenarios to select from. An VO scenario (e.g. memory mapped UO, interrupt driven UO) con-3Symphony is part of CoWareTbfand is responsible for the hardwardsoftware interfacing and system architecture co-synthesis and integration problems. sists of a software driver and a hardware counterpart for implementing a specific channel type on a particular processor. The Symphony toolbox assigns U 0 scenarios on a per-channel basis. The U 0 scenarios are designed to work correctly in a stand-alone operation mode. However, when assembled to a complete hardwarekoftware interface, deadlocks may be introduced for a particular combination of VO scenarios. Exhaustively verifying all these combinations by simulation is very time consuming as a new compilation step is required for every investigated I/O-configuration. Selecting the right VO-configuration simply by inspection is quasi impossible, as the "real" problem instance is far more complex than what is presented here. Our verification approach formally diagnoses the cause of a possible deadlock, and provides the necessary feedback to select the right U 0 scenario combination(s). To get an insight in what this means, an example is worked out below.
Consider the following UO-configuration. The symbol creator of the sender part receives incoming samples from the INCHAR channel via software polling. Processes P1 and P2 of the de-slicer (see Figure 9 (a)), on the contrary, are assigned to the FIQ and IRQ interrupt routines of the ARM-7, respectively. The corresponding hardware interface does not wait for the interrupt routines to complete; they are only "triggered".
Without loss of generality, we focus on the communication between the tracking module and the de-slicer. This can be modeled in our process calculus as follows: The suffixes -B and -E denote the triggering and the completion of the interrupts involved. F I Q and IRQ represent the (calling of) the respective interrupt routines themselves. The (simplified) processor model of the ARM-7 models that the FIQ-interrupt has a higher priority than the IRQinterrupt; in other words, the former interrupt can preempt the latter. If we run this example through JULIE we get the following reachability graph. 
State9
From state State9 in the table above, no progress is observed which shows that this particular I/O configuration results in a deadlock. Indeed, after the first FIQ-routine has finished, the IRQ-routine is called. However, this latter routine, before being effective, can be preempted immediately by a new FIQ interrupt, which has a higher priority, and as result we arrive in a deadlock situation.
In Table 1 , a number of possible I/O configurations are listed. As can be observed, verifying a particular configuration takes less than 1 second CPU time. From the table one can conclude that two configurations can result in a deadlock. For selecting the "optimal" I/O configuration, Symphony only pursues with the remaining alternatives. This result clearly shows that the process calculus and the subsequent Petri net translation, being able to model all communicating processes involved as well as the processor model of the ARM-7 itself, is indeed a powerful combination for analysis and synthesis of embedded systems. 
Conclusions
In this paper we proposed a formal model based on Petri nets for reasoning about the behavior of a concurrent system. Using this Petri net level, existing formal verification techniques can then be leveraged.
To provide a formal mapping between language-level constructs and Petri net operations, we proposed a process calculus as an intermediate model. We believe many of the existing process-based languages, used to specify heterogeneous systems can be first translated to "intermediate code" of our process calculus. We presented an elegant syntaxdirected translation scheme for building Petri net representations starting from process expressions.
The viability of our approach was tested, on a "real-life'' example, for which the results were very promising. We are currently investigating the inclusion of explicit timing constraints. In the future we plan to fully integrate the presented verification framework into the Symphony design flow.
