Requirements for Safe Parallel Systems
Designing sequential systems is a familiar technique to many but designing parallel systems is perceived as difficult, and developing reliable parallel systems seems impossible. Indeed the early draft standards discounted parallel processing as unsafe as the comittees considered parallel systems introduced complexity which does not exist on a sequential machine. Consider two communicating processes accessing a resource. The messages between processes must be examined to prevent deadlock and a process must not be livelocked out of accessing the resource. The requirements of the processes must be handled fairly to keep the computation load balanced. Analysis of timing can become complex as unknown communication delays have to be considered.
If the above problems can be removed or diminished then parallel processing offers additional benefits. The world is an inherently parallel place and the decomposition of a parallel problem leads naturally to a parallel implementation avoiding the complexity which evolves from forcing a sequential solution . Each process in parallel system is simple and self contained which aids analysis. Performance can be scaled by adding extra hardware instead of optimising the code and this extra hardware also leads to increased reliability with the possibility for fault tolerance through redundant hardware.
Design Issues

Safe State Analysis
Safe state analysis is concerned with identifying unsafe or hazardous states within systems. The systems nature has to be stressed here, for software on its own does not cause accidents, or represent potential danger to life and environment. Software, hardware and the environment are all within the scope of our concern.
For concurrent systems, we concentrate on safety and liveness properties. Safety and Liveness properties of concurrent programs have been studied and formally defined in and . Informally we can state that:
A safety property stipulates that some bad thing does not happen during execution, and a liveness property stipulates that a good thing happens during execution. Furthermore, we can introduce temporal logic to make statements about properties we wish to hold over time. The temporal logic includes operators ALWAYS, EVENTUALLY and LEADS_TO, which operate over a list of predicates. Consider a level crossing, a safety property might be:
Always ( (car_over_rails AND NOT train_approaching) OR (NOT car_over_rails AND train_approaching) )
whereas a liveness property could be:
EVENTUALLY (barrier_down LEADS_TO train_approaching) Now we have a conceptual way of defining liveness and safety properties, we need a means of modelling a system so that we may analyse it for the these properties.
Petri Nets as a Model for Safe State Analysis
Petri nets are a mature model of concurrency, and high level nets such as Coloured Petri Nets are powerful modelling tools with commercial software support . We can view the net as a model of the causal relationship between states of communicating sequential processes. Coloured Petri nets are an extension to Place/Transition nets and a refinement of Predicate/Transition nets allowing tokens to represent data values from a multiset or colour set. Resultant nets are generally smaller than the equivalent Place/Transition net, because similar parts of the net may be folded into one, and the distinction between parts of the net is defined by token colours. Tokens can now represent complex data objects, and analogously, different token types are distinguished by their colour. A marking must now specify the multiset of tokens occupied in any place. Boolean expressions can act as transition guards, enabling or disabling a transition. Arc expressions can change the values of variables bound to tokens, and generate or destroy tokens if needed.
Petri nets are especially useful, as they allow us to model the whole system, not just the software or hardware which is under computer control; but also the non-deterministic environment in which the system operates. Such a systems viewpoint is necessary in control systems which react with the environment.
Places are used to represent conditions and transitions to represent events. A Hazard is defined as a set of conditions with a state from which there is a path to mishap . From this, the reachability graph allows the designer of the system to determine if the system can reach any hazardous states. For complex nets, the reachability tree suffers from the state explosion problem. This problem may be overcome by identifying the hazardous states, and working backwards to discover if the initial state can lead to a mishap. An inverse net may be created (where input and output functions are reversed), using the hazardous state as the initial state and checking to see if the true initial state can be reached. This does not always reduce the reachability tree significantly, but Leveson and Stolzy demonstrate algorithms that can reduce the amount of work needed in analysis . It is possible to show that a system is free from hazards but it does not necessarily follow that the system meets its specification (a totally safe system need not do any useful work). So safeness is not a synonym for correctness.
Using these techniques we can analyse for deadlock and livelock; the two areas of concern in parallel systems. Manna and Pnueli state that all parallel programming problems can be reduced to problems of livelock and deadlock. Clearly, tools and methods for ensuring livelock and deadlock freedom would raise our confidence in the reliability and safety of parallel programming.
Deadlock
Deadlock occurs when all processes within the system become blocked. In parallel systems, this is most usually a result of an error in synchronisation between communicating processes. Deadlock analysis is a fairly trivial task. In terms of the Petri net reachability graph, it is sufficient to show that all end nodes of the graph represent states that can be found earlier in the graph , i.e. there is no node in the graph that represents a state where no further transitions can become enabled. Showning that at least one transition can always fire is sufficient to prove deadlock freedom.
Livelock
Livelock analysis requires more effort. Livelock occurs when an individual process is indefinitely blocked from progress. We need to show that for any individual set of states (realising a single process within the system), there is always the potential for progression to a new state. We can specify the required behaviour as a liveness criterion in temporal logic, i.e. EVENTUALLY (p1 LEADS_TO p4) -a token at p1 will eventually be able to progress from place p1 to p4. Inspection of the reachability tree can confirm this property.
This assumption precludes the use of a fair scheduling policy for processes at the implementation level. Hoare suggests that the correctness of a program should not depend upon any fairness assumptions of the implementation and that it is the programmer's responsibility to ensure that the program behaves correctly. This requires us to remove all unbounded non-determinism from our model of the system, or at least impose a semantic check on the model which highlights those parts of the system that may result in unfair choice. This is a qualitative rather than quantitative expression of liveness. In hard real-time systems, livelock over a 10 minute period may be regarded as the same as deadlock. If the process is blocked for longer that some critical time period, it may well have the same consequences as if it were blocked forever. Therefore we need to incorporate time into the model.
Timed Petri net models usually associate a time delay with either places or transitions. Tokens arriving at a transition are unavailable for firing on the transition's output arcs until after some specified time period. A reachability tree for a timed net is a subset of the untimed tree and as such is smaller (or in the worst case equal) in size to the untimed tree. This is an example of timewise refinement, the addition of timing information makes the system behaviour more predictable. Davies states that including timing information has the effect of resolving non-determinism . Using this extra timing information, we can make statements such as, 'does place 1 eventually fire transition 8 within 4 ms?'.
Holding shows how time critical and safety critical software may be specified and designed using Temporal Logic and Petri Nets . He recognises the need for a property preserving transformation from the Petri net to some implementation language. Occam is a strong choice since it is highly parallel and has a well defined semantics, and because of this it is the preferred target language for transformation by many researchers including Gorton [Gor-90] and Croll .
Temporal specification and parallel systems
In order to reason about a parallel real-time system, it is necessary to consider the behaviour of the system with respect to the variable time. Wirth first pointed out the need to distinguish between program correctness and the satisfaction of timing properties in 1977 . Since then many differing approaches have been advocated, from logics with extensions for time , through to programming languages with real-time semantics . One approach being investigated is the use of an executable specification language, PAISLey , to design occam programs with predicable temporal behaviour. This approach has advantages over more traditional formal approaches, as it allows the incorporation of hardware specific information into the specification and hence more realistic analysis can be performed .
For instance, checking the consistency of timing constraints in a parallel system can be a complex task. By using the PAISLey environment this analysis can be done automatically since the execution environment has a constraint analyser. Furthermore, timing inconsistencies may occur at simulation time that were not detectable by the static analysis of the PAISLey checker. These will be brought to the attention of the user during the simulation. It is consequently up to the user to generate test data which ensures all timing extremes are tested during a simulation. Having analysed the specification and performed a structured transformation of the specification into an implementation, we still have to prove that implementation can meet the constraints we have specified. This requires that information is included in the simulation which is implementation related. In the case of the transputer implementation this requires knowledge of context switching times and low priority scheduling. For example, assuming that all controlling processes must be placed on a single transputer allows the use of the built-in scheduling methods of PAISLey. By ensuring a round-robin schedule on all the processes a crude simulation of the low priority mechanisms of the transputer is made. By then modifying the execution environment a closer simulation is produced. This is achieved by means of user commands within the PAISLey environment. For example, Figure 1 shows the commands used to initiate the environment to act as a 5 Mhz T8. The scheduling is initiated as round-robin with the setscheduling, and the setoverhead ensures that processes are brought to the top of the queue every 2 microseconds which is approximately the period of two timeslices. PAISLey environment modified to simulate a 5MHz T8.
Integration: what is needed?
The two examples of commercially available tools presented here are StP and Design/CPN. These constitute two different approaches to the integration of design techniques. The StP approach, uses a well established design methodology 1 , and integrates various formalisms to check differing aspects of the design. The Design/CPN system uses a consistent approach throughout, concentrating on applying a rigorous theory which integrates all the necessary mechanisms (i.e. time, state analysis, modularity, hierarchies). Each approach has its own virtues, and drawbacks. But if heed is taken of the lessons of science in general, mathematics and physics two points in case, it can be seen that the "unified solution to everything" never survives long under close scientific scrutiny. Thus, it is the authors belief that a flexible environment, as offered by StP, which allows different techniques to be used for different problems has more potential and flexibility. The future may, or may not, provide better approaches for each of the problems highlighted. But if these solutions arrive, the environment can be changed to cater for these advances. Thus leaving the intuitive design process constant, and only changing the analysis performed.
Software through Pictures
Software through Pictures (Stp) is a CASE package developed by IDE. It provides an open environment to which user applications may easily be added. It allows an integrated systems approach which is guarantied to provide consistency between tools. The use of CASE and rapid prototyping can encourage less time being spent reasoning about a design. If formal checks can be introduced in the design cycle then we can introduce more confidence in the design. This is a step towards truly engineered software in which the union of CASE tools and proof assistants in a fully automated design environment. Figure 2 shows an example of StP with extensions to include Petri-Net editor, PAISLey specification language and the occam toolset. 
Design/CPN
The CASE tool used is Design/CPN which is a commercial product (Figure  3. ). This will emphasise how a diagrammatical approach maintains the inherent parallelism of the design while abstracting the mathematics. 
Conclusions
CASE tools provide a framework for the widespread adoption of formal methods in software engineering. For safety applications, hiding formal methods behind automated tools is the only way to increase confidence in designs given the current level of expertise within the software engineering community. This is the subject of continuing research.
