In this paper we present a process algebra approach for the integrated veri cation of correctness and performance in concurrent systems. The veri cation procedure is entirely performed within the Circal process algebra, without any recourse to other formalisms. Performance is characterised in terms of logical properties, which do not incorporate explicit time. Such properties are then interpreted in terms of degree of concurrency and allow the quantitative evaluation of the throughput of the system. The approach has been applied to two four-phase handshaking protocols, which are motivated by the implementation of the AMULET2 asynchronous RISC processor. Both correctness and performance properties are captured in the same veri cation framework and automatically proved using the Circal System.
Introduction
Formal methods have tended to concentrate on veri cation of correctness of software and/or hardware systems. Typical practical reasons to apply formal methods have been to ensure the safety of a software or hardware system. Once the safety properties that the system is required to meet are speci ed, veri cation consists in checking that a communication protocol, an item of digital hardware, or a distributed computer system implementation is equivalent to its speci cation. However, in the broadest sense veri cation must include a performance analysis of the system and formal methods should be extended to allow performance to be included in the analysis. This is especially the case for those application domains, such as hardware systems, where performance plays a key rôle in the choice between alternative technologies.
Past techniques in performance analysis have featured model development based on experience and intuition and analysis by simulation or analytical techniques based on assumptions that are known to be approximate. Two major modelling approaches that have been used for both veri cation and performance analysis are timed Petri Nets and timed process algebras 6]. A standard method to introduce performance analysis into these two formal methods is to associate time with the actions of the process algebra and the transitions (or equivalently the residence time of the places) of the Petri net. These times could be either deterministic or stochastic. In the analysis of the former we can use max-plus algebra 1]. The latter (usually with exponential time distributions) leads to stochastic Petri nets and stochastic process algebras. Performance analysis is then based on a derived Markov chain.
Adding time in this way to both Petri net and process algebra models increases the complexity of any analysis procedure as compared with untimed case. Markov chain analysis is particularly restricted by state explosion. Some Petri net methods allow abstraction of time from the model where it is not signi cant by having zero time transitions but most stochastic process algebras do not allow abstraction with zero time transitions.
In a previous paper 3] we have introduced an approach to the integrated veri cation of correctness, timing and performance properties in concurrent systems, using the Circal process algebra and its mechanisation, the Circal System 10]. Our approach does not make any explicit use of time. Performance is not determined by absolute values, but by the degree of concurrency of the system components.
In this paper we develop our approach 3] into a rigorous methodology and we apply it to two di erent four-phase handshaking protocols, proving both their correctness and interesting performance properties, which allow a quantitative evaluation of the throughput of the system.
The Process Algebra
Process algebras are mathematical formalisms for describing systems of interacting Finite State Machines (FSMs). The interaction is given by synchronising the transitions that occur in di erent FSMs. This can be done in several ways, which di erentiate the many process algebras that appear in the literature 2, 9, 10, 11].
Hierarchy of Processes
Every process algebra has one (or more) parallel composition operator, a hiding (sometimes called abstraction) operator and a relabelling operator. The combination of the three operators allows for the structure of any system to be modelled as a hierarchy of abstraction levels, as shown in Figure 1 . Every box represents a process and is decomposed into the components at the lower level that it is connected to. For example, process S 1;2 consists of two components: process S 2;2 and process S 2;3 . Only the boxes that are leaves of the hierarchy, not necessarily at the lowest level, explicitly encapsulate behaviours (S 1;3 , Every process interacts with other processes through communication ports. Interaction between processes occurs through the actions that are associated with the ports. In Circal 10], CSP 9] and LO-TOS 2], a communication channel connects all the ports that are labelled with the same action. In CCS 11], actions are coupled in complementary pairs (input and output actions) and a directed communication channel connects the two ports that are labelled with complementary actions. In our approach we adopt the communication paradigm of Circal, CSP and LOTOS. Rather than formally introducing one of these process algebraic languages, we will present all necessary concepts in an intuitive graphical fashion.
If we look at the hierarchy given in Figure 1 from a bottom-up point of view, every process, apart from the root, is embedded within the parent process by composing it in parallel with other processes; by hiding some of the actions that are used for interaction; and possibly by relabelling other actions. any line describing the communication in which the given port is involved, otherwise. The embedding of a set of processes within the next abstraction level is represented by a box surrounding the set of processes, with new bullets (with the corresponding actions, which may be relabelled, written next to them) on its boundary to represent all the ports that are not hidden after the composition. These new bullets are connected by lines to any of the corresponding internal bullets. For example, in Figure 2 In this way, the box that embeds a set of processes represents the interface of the composite process. Every process has a sort, which is the set of action names which label the ports on the box that embeds its components. For example, in Figure 2 , S 0 and S 1;1 have sort fa; b; cg, S 1;2 has sort fb; c; dg and S 1;3 has sort fb; d; eg. For a behavioural process, its sort (or interface) must contain at least all the actions that occur in the embedded behaviour.
Process Behaviour
From a behavioural point of view, the parallel composition of behavioural processes may be expanded into a global behaviour. Every state of the global behaviour is given by the product of component states, one for each component. The precise semantics of parallel composition depends on the speci c process algebra involved. In this paper we consider the approach adopted by the Circal process calculus 10], where every transition between states is labelled with a (possibly empty) set of actions.
Given a set of processes and a set of transitions, one for each process, the transitions of the set may synchronise if and only if, for each action that belongs to the label of at least one transition, if the action does not occur in the label of a transition of the given set, then it does not belong to the sort of the process to which such a transition belongs; here causally independent actions are synchronised. If the transitions of the set may synchronise, then they must synchronise if and only if there is at least one action in the intersection of their labels; here identical actions from distinct component processes synchronise. When transitions of di erent processes synchronise, the label of the transition of the composite process is the union of the labels of all components.
For instance, if we compose S 1;1 , S 1;2 and S 1;3 in Figure 3 , then the transition labelled by fa; cg from A 1;1 to A 1;1 in S 1;1 must synchronise with the transition labelled by fcg from A 1;2 to B 1;2 in S 1;2 .
The corresponding transition of the composite process, which is represented in Figure 4 
Process Models of System Components
In our veri cation methodology we utilise the core modelling object, namely a process, to model four quite distinct artifacts. The rst of these is when we model the physical components of the system under investigation.
The second artifact that we model by a process, or processes, are assumptions on the behaviour of the system. These assumptions usually relate to context or environmental restrictions and generally simplify the system behaviour when composed with it using the composition operator.
The third artifact that is modelled by a process is the speci c property which we want to verify as holding in the system and which is used to describe the notion of system correctness.
The fourth artifact that is also modelled by a process is a re nement of part of the behaviour of another process. By composing a given process with a re nement process we extend the behaviour of the given process. Together with the hiding operator, this allows the de nition of a new view of the system. In this paper we will consider only a single type of re nement, namely a time interval re nement. Processes may also model di erent artifacts at the same time. We will see processes that are both re nements and assumptions.
Modelling Asynchronous Micropipelines
In a RISC processor the instruction pipeline is composed of logic stages and latches. Progress through a synchronous pipeline is managed by the clock; once the logic has completed evaluation all the latches are clocked at the same time simultaneously moving all the instructions to the next pipeline stage. In an asynchronous micropipelined processor 12] the evaluation of a pipeline stage is governed by local interactions with its neighbours using a request acknowledge handshaking protocol. It is possible that one stage is evaluating while at the same time a stage further on is transferring an instruction to its neighbour. Thus, whilst the performance of a synchronous pipeline is governed entirely by the clock rate that can be achieved with a particular logic design, the performance of an asynchronous pipeline depends as well on the design of the handshaking controls for each stage. In particular, if the asynchronous logic pipeline is to be as fast as the synchronous one it must be possible for all the evaluation of logic stages to overlap as in the synchronous case.
In our example, we analyse the correctness and the performance of two micropipelines which are motivated by the implementation of the AMULET2 asynchronous RISC just sequences of latches without logic stages in between. The whole micropipelines can then be seen as FIFO queues.
Speci cation
The speci cations of the single stages of the two latch control circuits we are going to model are given by the STGs (Signal Transition Graphs) 5] in Figure 5 , where the dashed arrows denote the orderings that must be maintained by the environment (assumptions about the environment) the solid arrows denote the orderings that must be ensured by the circuit itself (properties of the circuit) a solid circle is attached to an arrow in order to denote that the target of such an arrow is initially enabled to occur. Rin and Rout de ne the input and output request signals. Ain and Aout de ne the input and output acknowledgement signals. The Lt latch control signal causes the data latch to be open when low (Lt?) and closed when high (Lt+). The STGs in Figure 5 show that when input data is available (Rin+) the latch may close (Lt+) and then the input may be acknowledged (Ain+); when the output data has been acknowledged (Aout+) the latch may open again (Lt?). It is responsibility of the environment to ensure that an input acknowledgement signal (Ain+) will eventually reset the input request (Rin?), the reset of the input acknowledgement (Ain?) will be eventually followed by a new input request signal (Rin+), an output request signal (Rout+) will be eventually acknowledged (Aout+), the reset of the output request (Rout?) will be eventually followed by the reset of the output acknowledgement (Aout?). These assumptions about the environment are denoted by dashed arrows in Figure 5 . Notice that the assumptions are the same for both STGs.
Implementation
The STGs in Figure 5 can be implemented into circuits using informal or semi-formal synthesis techniques. A possible implementation of the STG in Figure 5 (a) is given in Figure 6 (a) and a possible implementation of the STG in Figure 5 (b) is given in Figure 6(b) 7] . In the implementation in Figure 6 (a) we have used a conventional Muller C-gate, while in Figure 6 (b) we have used two asymmetric Muller Cgates. In the notation for the asymmetric C-gates an input connected to the main body of the gate controls both edges of the output; an input connected to the extension marked with \+" controls only the rising edge; an input connected to the extension marked with \?" controls only the falling edge. A logic description of the C-gates in Figure 6 Once the gates have been de ned as the basic components, the whole circuit is speci ed by composing in parallel the processes that de ne the gates, with additional processes to de ne the delays in the gates or on the wires. For example, the circuit in Figure 6(a) is represented in the process algebra by the Sts process de ned in Figure 7 . To make the gure more readable, we have not indicated the labels of the internal actions that are abstracted away in Sts given to the circuit in Figure 6 (b). The solid triangles denote the ports that have boolean values. Such ports can be low, that is ready to generate a rising signal (+) or high, that is ready to generate a falling signal (?). For example, Rin can be ready to generate Rin? or Rin+. A port that is initially high is denoted by N; a port that is initially low is denoted by H. How to model Not and C has been presented previously 10, 4]. The D process, which has an input port on the left and an output port on the right, may be modelled by the behaviour in Figure 8(a) , where the ports are initialised to the low level. D(in; out) de nes an arbitrary delay between the in and out signals. In fact, an arbitrary number of external actions may occur between input in+ and output out+.
Correctness Veri cation
The two STGs de ned in Figure 5 can be modelled in our process algebra framework. This is done by de ning a process for every single relationship between pairs of signals connected by an arrow, and by then composing all these processes together.
An arrow between signals pre and post without solid circle attached is de ned by the C process represented in Figure 9 (a). Signal post is not initially enabled. Therefore, it must always occur either simultaneously or after an occurrence of signal pre. An arrow between signals pre and post with solid circle attached is de ned by the IC process represented in Figure 9 (b). Signal post is initially enabled: Therefore, after the rst occurrence of post every further occurrence of post must always occur either simultaneously or after an occurrence of signal pre.
The dashed arrows are represented in terms of processes in the same way as the solid arrows. However, these processes play di erent rôles in the speci cation in that processes that correspond to dashed arrows are assumptions, whereas processes that correspond to solid arrows are properties. The properties expressed by the STGs above are safety properties; they assert that, for all possible executions, the dened ordering of signals must not be violated. A veri cation methodology used in a process algebra framework consists of checking whether or not the process P that represents the safety property to be veri ed constrains the process S that represents the system. If P constrains S, then the property represented by P is not implicitly modelled in S; on the other hand if P does not constrain S, then the property represented by P is implicitly modelled in S, that is, the system satis es the property.
The key point of the methodology is how to check whether or not one process constrains another. This can be done by an appropriate combination of the parallel composition and the equivalence checking procedure, which is available in the Circal System 10], a proof toolset which mechanises the Circal process algebra. In order to constrain a process S with another process P, we just need to compose S and P in parallel. Thus, if the constraint expressed by P is already implicitly modelled in S, then the parallel composition of S and P must be equivalent to S itself. If we denote the parallel composition by and the equivalence checking by =, we need to check the equivalence:
S P = S (1) When the safety property only holds under the assumptions de ned by a process A, equivalence (1) becomes A S P = A S (2) In this case the property is veri ed for the constrained system A S. In (2) A can be any assumption, included a timing constraint, and P any property, included a performance property. In this way the veri cation schema given by (2) integrates correctness, timing and performance veri cation 3].
In order to verify the correctness of the circuit given in Figure 6 (a), which is modelled in Circal by the Sts process de ned in Figure 7 , we have to dene A by composing in parallel the instantiations of C and IC that represent the assumptions and P by composing in parallel the instantiations of C and IC that represent the properties. (2) we replace S by the Sts process we can automatically verify using the Circal System 10] that the equivalence is true. That is the single stage modelled by Sts(Rin; Ain; Aout; Rout; Lt) meets the properties modelled by P under the assumptions modelled by A. Therefore, the implementation given in Figure 6 (a) is correct with respect to the speci cation given in Figure 5(a) .
In order to verify the correctness of the circuit given in Figure 6 (b), we have to de ne A and P from the STG in Figure 5 Analogously, we can automatically verify using the Circal System that the implementation given in Figure 6(b) is correct with respect to the speci cation given in Figure 5 (b).
Performance Analysis
In the previous section we have seen how to automatically verify that the circuits de ned in Figure 6 operate correctly. Whilst they both are correct with respect to the corresponding STG speci cations, they show di erent performances. Several stages of our asynchronous micropipeline controller may be connected in series as shown in Figure 10 . Here St i , i = 1; 2; 3, must be instantiated by Sts(Rin i ; Ain i ; Aout i ; Rout i ; Lt i ), when using the control circuit given in Figure 6 The latches corresponding to the controller form a FIFO of registers. The maximum potential parallelism for such a FIFO occurs when all the latches are full at the same time. Whether or not such a potential parallelism is e ectively attained depends on the handshake control protocol.
We de ne throughput the number of data items that can be passed through the pipeline per complete handshake cycle. This de nition is justi ed by the practical observation that asynchronous pipelines are limited by the elapsed time for one handshake cycle in the control stages. A handshake cycle for the i-th stage is the sequence of events from Rin i + to number of pipeline stages that can be full at a particular time 13].
We can notice that in the STG in Figure 5 (a) Aout i must be low (and therefore the next latch empty | refer to Figure 10 ) before Lt i can go high (and this latch become full). This is not the case for the STG in Figure 5(b) , where the input side and the output side of the latch control stage are partly decoupled. A consequence of this decoupling is that a falling signal Aout i ? that acknowledges the falling signal Rout i ? in a given handshake cycle is concurrent with a rising signal Lt i + in the next handshake cycle. In this section we analyse the implication of this decoupling on the performance of the micropipeline.
The stages of the micropipeline are connected together as in Figure 10 . The Sync processes, de ned in Figure 8( We can carry out the automated analysis of the performance of the micropipeline using the same methodology we have used in the previous section for the correctness proof 3]. Now, the safety property to be used will catch some performance aspects of the system rather than just orderings of event occurrences. We want to express the performance in terms of throughput. We then need a way to characterise when a stage of the micropipeline is full. The i-th stage starts to be full when Lt i goes high. At this point the data in the corresponding bu er is latched. We introduce the MI 1 (from; to; mark) process given in Figure 11 (a) to mark with the new mark abstract action the time interval between action from and action to. Action mark in called abstract because it does not belong to the set of physical actions performed by the system that we are modelling. When an instantiation of MI 1 is composed with a stage of the micropipeline as shown in Figure This is an instantiation of equivalence (1) With the Circal System we can automatically verify that using the simple control circuit at most alternate stages can be occupied at the same time. In fact the Ps property holds for adjacent stages, but not for alternate stages. Therefore the degree of parallelism achieved is only 50% of the potential parallelism. This is equivalent to a throughput not greater than 50% 13] .
If we check property Ps on a micropipeline of semidecoupled control circuits, the property does not hold for any pair of stages. This proves that also adjacent stages may be occupied at the same time. So no upper bound of 50% is given to the throughput. However, we would like to know how many stages can be occupied at the same time. To achieve this, we need to use the view V (full) in Figure 13 , in which the i-th stage is re ned using the MI n ?i process, with n indicating the number of stages in the micropipeline. The generic MI i process starts marking the time intervals from the i-th occurence. MI 1 is de ned in Figure 11 (a) and MI 2 is de ned in Figure 11(b) . MI 3 may be de ned analogously. In this way we start marking intervals only when the micropipeline is working a r egime, that is, when all stages may be occupied.
We want to verify whether this potential full parallelism among all stages can be e ectively achieved. Since full belongs to the sort of every instantiation of MI i , i = 1; 2; 3, in Figure 13 , according to the composition rules introduced in Section 2.2 the time intervals that characterise when stages are full have non-empty intersection for some possible execution i the full action visibly occurs in the behaviour of V (full). In particular, if there is an execution such that all stages are full simultaneously in every handshake cycle, then the V (full) view is equivalent to the Pp(full) process de ned in Figure 11 (d), which models an in nite sequence of full actions. Using the Circal System we can prove that V (full) = Pp(full) (4) and verify that a micropipeline of semi-decoupled latch control circuits has a possible execution such that all stages are full simultaneously in every handshake cycle. Therefore the semi-decoupled latch control circuit shows a possible throughput of 100%. Whether such a performace is e ectively attained depends on the environmental context where the micropipeline operates. In a simple FIFO the maximum performance may be reached, but if the micropipeline incorporates processing logics there may be a performance degradation 7].
Discussion
In this paper we have presented a process algebra approach for the integrated veri cation of correctness H Information Technology Division of the Australian Defence Science and Technology Organisation and in part by Sun Microsystems Laboratories, USA
