Abstract. This article presents how various Formal Methods have been involved, rst on their own, then coupled, in the di erent steps of the industrial development of an embedded software for an electricity meter. Synchronized Transition Systems have been used to conceive and implement some Rendezvous mechanisms for the distributed kernel, and the physical link protocol supporting communication between processors. The Rate Monotonic Analysis model has been completed to suit some features of the product; however it appeared too rough to reach a positive issue. So we coupled both (Synchronized Transition Systems and Rate Monotonic Analysis) to achieve a ne analysis of the temporal properties of the system under development. This can be considered a rst step towards Formal Methods Engineering.
Introduction
The aim of this article is to report on an experience of using Formal Methods in a real industrial project. Due to the lack of space we shall not enter into the details of all modeling and checking that we have done during this project. Instead we present an overall survey of all the aspects of this project where Formal Methods have been used, with their impact on the development of this project. We wish to advocate that Formal Methods can be really and pro tably applied in a lot of various situations.
The Eurotri Project
The EUROTRI project was aimed by Schlumberger-Industries to de ne, conceive, design and develop a static electricity meter with large abilities of tari programming and distant measuring. As short time of development and low cost of production (ie: use of components at frontier of their technical limits) were the challenges, it has been decided at the very beginning of the project to use Formal Methods in most steps of it, from feasability study to reception testing. The rst contacts date beginning of '92, and the meter passed (in a straight way) the nal tests mid '94. This Time To Market delay has been considered as very short, and related to the use of Formal Methods all along the project. 1995 has been dedicated to the evaluation by the industrial party of the integration of this process in an industrial environment.
The team in charge of the project was S emantique du parall elisme (Concurrency Semantics), headed by Prof. A. Arnold. Although the whole team was involved in the cooperation, a hard core was composed of A. Arnold In this paper, we present how the research team joined the industrial project and played its part; then how Formal Methods have been used at di erent levels, di erent times and in di erent ways; and nally, as a conclusion, the results of this process for every party.
Why using Formal Methods?
This question relates to two important issues: what bene t is awaited from using Formal Methods, and why the choice of these peculiar ones?
Awaited bene t Formal Methods have been used in the main aim of avoiding backward steps in the development: as every step is mathematically veri ed there is no need to loop back to some erroneous previous step, and even to iterate. This ideal straighforward development is the big issue of Formal Methods, provided they are e ciently applied.
Behavioural study Concurrent behaviour was at the heart of the project: from implementation (study of the protocol of the physical link handling signals between both processors) to design step (design of the Rendezvous mechanism) and consistency of the call graph of the set of tasks shaping the application layer. The Synchronized Transition Systems model uses Labelled Transition Systems to describe individual behaviours and Synchronization Systems, ie. sets of rable labels con gurations, to express interactions between the individual behaviours. Then the product of the Labelled Transition Systems with respect to the Synchronization System is built; it gures out the behaviour of the whole of the interacting system. Boolean properties can then be computed on the states and transitions of this graph 2]. The availability of a powerful model-checker, the MEC tool 3] settled it.
Temporal study The formal approach to temporal aspects of the system, more precisely the respect of every deadline, has been since the beginning considered out of reach of the automata approach, as time could be in no way unfolded in a so wide range of periods : some tasks have a frequency of 50 kHz, some others a period of 6 months. A Rate Monotonic Analysis 12] had been successfully used in a previous development, and it has been considered suitable to ensure desired timing properties. Unfortunately the Rate Monotonic Scheduling hypotheses (perfect scheduler, instantaneous preemption, periodic and independant tasks, etc) were not met, and the load of both processors was too high to allow such a rough approach. As it happened during the development, a coupling of both Formal Methods has been successfully studied and used to achieved a ne study of the loaded system. Who uses ? Formal Methods are just another tool to use in a development, and that means they have to be practicable by the engineers in charge of the project, who have already a lot of tools to control. In the case we present here, the basic common culture of them was electronics, and the readability of automata by electronic engineers, as well as the fundamentally synchronous approach of the model. On another hand the Rate Monotonic Analysis had been used in a previous (and much lighter) project the year before.
Formal Methods at Work
Traditionnally at Schlumberger's the families of trades involved in such a project were marketing, industrialization and technics. In this case the technical part was composed of three teams: mechanics, electronics, software.
As validation had to deal with electronics as well as software, and was directly involved in the reliability of the product, we decided to propose another team dedicated to validation, based on the S emantique du parall elisme team, with a frequent go between; this team has been known in the project as \Bordeaux team", making the Formal Methods used out of the software team.
The strong point of the Bordeaux team was its expertise on the MEC system already used in various domains, and its ability to tailor adhoc versions if necessary; the weak point was it consisted of part-time researchers, in a limited number. So all aspects of the projects would not be covered at the same time. Instead of scattering the forces, we proposed to focus on the kernel design and development with the dedicated team while another development team would do with the application, that was more classical matter for the people involved, and supposed to be more easily contained.
In order to spare time, two sub-teams of the software team developped simultaneously in a layer approach: one subteam had to deal with underlying electronics to provide some services used by the other team to achieve functionalities of the software as de ned by the marketing team, and compatible with industrialization requirements.
The Embedded System
The new concept of electricity meter and its advanced functionalities implied the massive use of software, unlike previous generations which could be developped on an electromechanical or electronic basis.
Basic features of the meter were obvious: sampling signals (either power characteristics or signaling elements) and signal processing, keeping a database of tari s and counters, managing protocols. Some other features are less obvious: safety of measure (whatever could happen on the metered lines, even drops of power), integrity of the system (protection against fraud), or low energy consumption of the meter itself. Another need was to design a family of products, and the di erent variants had to be derived afterwards in an easy and safe way.
Preliminary Study
It appeared that a single processor of the Texas Instruments 320C30 (for example) class could be a good candidate to support the features, excepted for the consumption of the meter. A low consumption could be achieved only with very lowlevel processors, and no single one could achieve all the features. The idea then was to use \more than one" low-level{low-consumption{ low-price processors (such as the ones used in domestic applications), in order to meet the requirements.
To cover all features, two di erent processors have been chosen, one more dedicated to signal processing, and the other on controlling. But nearly every feature had to use both of them to be achieved. It was then necessary to conceive a system with a large use of communication between the processors, and that raised the question of ability, leading to a need of a reliable software architecture, and thus to the use of some Formal technics.
It was then decided to conceive in a layer approach a distributed kernel implementing a Rendezvous scheduler, o ering services to a distributed application, and to verify this architecture using Formal technics.
It has been part of the preliminary study to propose both the architecture and the ways to develop and verify it. So the Formal Methods have been considered as a companion for the project since the beginning of the design phase.
Formal Technics at hand
The kernel had to o er such elaborated mechanisms as scheduling services, based on a very low level hardware. This gap made the software di cult to conceive with a high level of con dence.
On the other hand that meant that a veri ed description at some abstract level would be available for the engineers who had to implement the software system. And the readability of the description would be part of the reliability of the implementation.
Synchronized transition systems are very handy for expressing Rendezvous techniques, and this would allow an abstraction of the kernel, and thus a test of the Rendezvous architecture of the application layer. A description explicitely using automata happened to cover widely the di erent cultures of the people involved in the project, on software part, on hardware part (closely interfaced with the software) and on the validation part. Even if such systems as MEC can model up to millions of states systems, it was stated it would be out of reach to model in one time all aspects (from signals on interruption lines to a succession of nested Rendezvous calls); actually we stated this would be the wrong way to do. The validation of the kernel would be made layer by layer, abstracting lower and upper layers.
Explicit time expression is not easily done using transition systems, even if some academic experiments are very promising 13]. To model explicit time aspects, and more peculiarly deadlines meetings, we decided to use the Rate Monotonic Analysis model, that was well suited for the type of system to express whether or not the deadlines could be met for concurrent tasks scheduled by a Rate Monotonic Scheduler. The applicability hypotheses for the Rate Monotonic Analysis are: perfect scheduler (e.g. context switching of zero time); instantaneous preemption of tasks; merely periodic and independant tasks; non-overlapping executions of tasks; priorities static and strongly related to frequencies. Unfortunately the tasks are not scheduled using a Rate Monotonic Scheduler, and the applicability hypotheses are not ful lled.
A strong e ort has been made to stay as near as possible to the Rate Monotonic Scheduling algorithm, but some veri cations had to be made, and the applicability of the Rate Monotonic Analysis to be checked on all reachable con gurations of this distributed system. So this model has been coupled with the Synchronized Transition System one to perform these veri cations and checks and validate control and call structures.
As it happened later, the Formal Methods approach with the MEC tool was of some pro t for both aspects of the project: ab initio for the kernel, and a posteriori for the application, leading to a good quality production, revealed by a surprisingly short time for industrial validation tests.
Even more surprisingly, this approach could be used to diagnose faulty behaviours observed at testing time, whose very low rate of appearance made the diagnostic tedious.
In the following we present these di erent aspects of the use of Formal Methods in this development project.
Development of the Kernel
The aim of the layer was to provide a cooperation mechanism between tasks as near as possible of the ADA concept of Rendezvous. For economical reasons, the hardware is of very low level and is composed of two processors communicating through a limited number of signals, without shared memory .
The challenge of the development of the kernel is to ll the gap between the hardware constraints and the highlevel, implementation independant features of the kernel.
In order to divide and conquer di culties, we adopted a layer-oriented architecture.
Hardware Description and Provided Services
The hardware is composed of two processors, a micro-controller ( c) and a Digital Signal Processor (DSP). DSP are processors specialized in signal processing and are rather poor on the control aspects such as stacks or instructions sets. The DSP sizes up to 2K instructions and 256 words of RAM, and the micro-controller goes up to 32K instructions and 1K bytes RAM, including registers. They are linked to some physical devices for measure and control, and connected in both ways through two serial lines and some signal lines (see gure 1). These serial lines bear bytes; signals, including a clock common to both serial lines and driven by the c are also present on the other lines. The DSP is badly suited for running a kernel: no interruption is associated with the line; moreover, the line is sensitive to noise altering messages and clock.
The kernel is designed to provide a distributed application with task calls and communication, either locally (same processor) or remotely, in a way analogous to ADA Rendezvous, in a severely restricted environment.
The principles of the Rendezvous have been retained as a uni ed concept for task synchronization and communication. We call local a Rendezvous whose caller and callee tasks run on same processor, and distant the other ones. We call strong a Rendezvous where both task have to exchange a message (and rst arrived waits for the other) to allow the Rendezvous to proceed (in callee's code), and weak Rendezvous where caller leaves a message, notwithstanding the previous message has been delivered or not. This terminology is the one used in the project; only strong Rendezvous are real ones; others are message passings.
The semantics of the Rendezvous have to be independant of the local{distant concept.
Rendezvous Scheduler
Synchronization and communication between tasks are achieved through a Rendezvous scheduler. This scheduler is responsible for implementing the Rendezvous: it manages the Rendezvous calls and accepts produced by the application layer. The structure of this scheduler is strongly related to the safety of the system (blocking freeness) and to its performances (too much control to avoid blockings would slow down a critical system). To keep the balance between both constraints, we decided to use formal modelling at two levels: the rst level was functional speci cation of the interfaces of the scheduler, and de nition of essential properties to check, in the aim to build a validated scheduling structure (control and variables); the second level was operational and aimed to detect useless controls in order to remove corresponding structures from the speci cation, to save memory space and computation time.
The functional description of this scheduler was made of transition systems: interfaces to caller and callee ( gure 2 and 3), mailbox for communication ( gure 4), and memorization of the status of the Rendezvous ( gure 5), with respect to both parties. In the callee's interface, the \end rdv" is a signal to the upper layer signaling the end of the Rendezvous execution. So this transition is related to no-one in the synchronization constraints. To model our four types of Rendezvous, we use di erent sets of those basic components, and di erent synchronization constraints between them. These synchronization constraints are expressed as vectors of labels, each label belonging to a given component, as shown in gures 6 and 7.
Another part of the functional description consists in the list of properties the system had to satisfy to be considered correct; actually we de ned a list for the weak Rendezvous and another one for the strong Rendezvous, including absence of deadlock, or control of memory, or consistency of mailboxes and registers.
The main results of the study have been: obviously a validation of the functional architecture, but also quantitative gains, as only two bits were proven necessary per Rendezvous and some controls could be removed, strongly lowering memory use. 
Presentation Layer
This layer stores the messages from the scheduler and transmits them one-by-one to the link layer; the service provided by this layer is the transfer of frames from upper to lower layers, at a pace compatible with the ow between processors. A potential danger is the over ow of the local storage, or under ow due to a wrong management of this store. In case of repeated weak Rendezvous calls, some information could be lost without any control. The link handles such situation in signaling the Rendezvous status to the presentation layer. This latter propagates the status to the application layer, in order to substitute extra calls (that would lose not yet treated messages) with a global call relating to a result of a global value, depending of the implied Rendezvous.
Functional Aspects are related to the underlying design principle, that is a critical section to protect a shared resource.
So we model a scheduler, a status of frame, a number of messages, a service for the scheduler to request, an indication for the link to signal free, and the link.
This leads to a synchronized transition system of 17 states and 29 transitions, which reveals that a deadlock occurs only if the stack of Rendezvous over ows.
Over ow of Stack is a di cult point to guarantee not to happen in this tight context. A rst point is that there is no queue for the Rendezvous, for no Rendezvous call is allowed in the body of the procedures of the kernel. So there is no need to handle more than a boolean for each Rendezvous. The second point is that the maximal number of Rendezvous is known at compilation time, excepted for weak Rendezvous where several occurences of same Rendezvous can happen.
So we can propose to implement this supposed dynamic structure as a static array of bits, indexed with the numbers of the Rendezvous, and a message counter.
A validation of this correct handling is driven adding to previous modelling two extra transitions systems keeping track of emitted and not yet treated weak Rendezvous calls, and modelling the array of booleans. The number of booleans is set equal to three.
This yields a synchronized transition system of 96 states and 236 transitions. Computed properties pointed out critical aspects in the design of stack handling. Actually, due to this early discovery, the implementation prevents the use of such faulty executions.
This work has been released in 10].
Link Layer
The link layer guarantee services at the frame level (no-blocking, no-loss, no-duplication of bytes) provided lower layer ensures correct transport of bytes. The solution is based on the well-known alternated bit protocol, as presented in 6]. It consists in adding to the payload information (number of Rendezvous, possible message) some extra structure information (an extra alternated bit, a byte count, and some redundancy through a CRC), working in a half-duplex way.
To ensure against an erroneous reception of the structure information, we need some watchdogs on both ends of the link. Unfortunately one of the processors is not equipped with suitable device. To remove this di culty, xed length frames are used in that direction, and watchdogs are set both for emission and reception on the other processor.
An expected problem for this layer was the non-instantaneous transfer at the physical level, where the data are exchanged in a full-duplex way. To deal with that problem we decided to use Formal Methods; so we modeled both lignes in an abstracted way, and each link (left, right) also.
The computations with MEC on the modelling aimed to four blocking states, assigned to some uncontrolability of the events relating the link layer to the upper one; these states leaded to a loss of the beginning of some frame. This re ects the physical asynchronism of both underlying processors: one is ready to send, the other one is not yet ready to receive. Then a solution was designed, where the link layer prepared the begin of next frame (to be ready to receive) immediately before sending the previous one (and thus passing hand over). The MEC computations validated this solution.
Hardware Protocol
An interesting point of the project is the study of the hardware protocol between both processors, as it relies on both software and hardware aspects of the system. Il happened to be one of the rst studied points, and the way it worked with engineers of both cultures was a great help to a better mutual understanding. The way Formal Methods happened to allow to remove a diod from the wiring impressed people involved in the project and made these methodological tools suddenly and strongly credible.
The aim of this protocol is to forward bytes from DSP to c and vice-versa. The protocol must meet the following property: no byte is lost on an hypothetical perfect line.
It is important in this matter to keep in view the physical architecture, and the abilities of the di erent \wires" to handle information. A representation of the physical structure can be found in gure 8. Transfers of data to and from the registers of the microcontroller are controled through interrupts, and a special attention must be paid to the fact that the DSP supports no interrupt mechanism: data entering the input register must be polled regularly. Another feature of DSP registers is their structure; they are double sized shift-registers. The c registers are one byte large.
Loss of data due to polling rate are de nitely di cult to detect by statistical simulation. The only solution consists of a protocol that absolutely avoids this risk.
Loss or spontaneous creation of data on the line embodies the perturbation of line by interference, namely interference losing or creating tops on the clock of the synchronous serial lines by spikes. This will imply a desynchronization in the ow of bytes. The only solution is a hardware reset on a time-out condition after communication has been erroneous for a too-large number of bits. Nevertheless, the physical layer protocol is founded on the use of a special character, denoted \b"; alteration implying such a character is a potential occasion of lock of data ow or of misinterpretation of frames. So to prove anything in that direction we have to be able to distinguish unambiguously the emitted, transferred, and received bytes. Actually, all bytes are dealt with in the same way in the modelling |as in the real life| and we have to take care of one byte. On another hand we have to deal with a common clock on both lines; this implies a signal must be considered on each line, providing the link with false bytes. So we consider only three di erent values of data to be transmitted through the link, \b", \X", \Y". We are then able to distinguish one of them from all of them. This transition system in the generic representation, where \ " stands for each of \b", \X", \Y", is displayed in gure 9. The physical objects we model are: the serial lines; the signals SORQ, SOEN, and the clock; and the four registers. As stated before, register management is a central point in the study. This is achieved using a protocol de ned on each processor in the way shown gures 10 and 11. This protocol has been designed in an experimental way, every attempt being validated or not using the modelling. In order to make sure every byte is correctly routed whatever has gone or is going on, the modelling has been completed by various sets of two testers generating arbitrary sequences of bytes, on one way or on both. In the most general case, both emitted a sequence of X :Y:X , and on the other side the Y neither lost nor duplicated. In that precise case, the modelling amounted to 60,737 states and 112,654 transitions.
This work is released in 10] and 5].
Development of the Application
In a way very di erent from what had happened for the kernel, the application part of the software has not been developped using Formal Methods from the very beginning; this does not re ect a methodological position: the main reason for that is that human (academic) power was not available, and that we needed a sane and e cient kernel, at the border of electronics and software.
Unfortunately some di culties arose in this part (application) of the project, and we had to step in at a detailed design phase of the development.
Formal Methods have then been used in two ways: the rst consisted to check the consistency of the calls of Rendezvous among tasks, specially in critical phases of the application, such as turning on or o , most of it when the power is cut while in the starting-up phase or comes back in the stoping phase; the second was related to the validation of respect of deadlines for every task, whatever could happen. This last point is a priori out of reach of the model used, but combined with Rate Monotonic Analysis technics it revealed e cient in handling this kind of temporal aspects.
Validating the call graph
In order to verify the mutual interlocking of the various tasks, those have been abstracted to their call structure as shown in gure 12, and the behaviour of each of them has been reduced to the control related to Rendezvous calls depicted in gure 13. The Rendezvous semantics has been expressed as a set of synchronization constraints. 
-
Unfortunately it appeared that the synchronized product of these 36 tasks could not be built in a reasonable time (a couple of days) using a server as a workstation: the modelling amounted to too much states (more than 4 millions). Actually the modelling of the behaviour of starting-up phase was not fully built! So we decided to study the call graph using a partial and progressive approach: the initial states sets have been modi ed to model from this new point of the system, up to what was physically buildable. This way of doing left uncertain the fact that the whole modelling had been studied.
This process allowed detection of interlocking situation among Rendezvous calls, and thus early modi cation of the call architecture; it allowed as well detection of controls on some Rendezvous that were actually of no use, and thus could be suppressed. These bugs would not have happened if Formal Methods had been applied earlier in the conception stage. A minor default has been revealed after tests, in a part that had not been studied. In a way, this con rms that Formal studies are both less practicable and less e cient when applied at then end of the project.
This work is released in 10].
Dealing with deadlines
The model used did not basically handle quantitative temporal aspects, and the tools available at this time revealed to be une cient to deal with so many tasks and a so large scale of time: some of the tasks had to be performed at a 50Khz rate, and another one had a period of more than 6 months! On the other hand, the temporal properties we intended to verify had more to do with respect of deadlines than with exact timing validation for each instruction to be performed. So we moved to a study following the Rate Monotonic Analysis approach, as the prerequisites for its application were ful lled (static tasks, static priorities ...) even if there was no proper theoretical scheduler in the system, but a Rendezvous manager. Let us recall than in this approach the static priority order is the period order.
The model as presented in 9] and 11] has been extended to take into account switching durations and real run-time priorities. The temporal characteristics of the circuits have been measured on the bench and the algorithm applied. It appeared that the deadlines were not met at all.
The rst idea was then to modify the too long tasks, to arti cially cut them into shorter ones, thus modifying the priority order. Then after some attempts, the algorithm could be run again on this new task organization.
The second idea was that this new set of tasks could be obtained with no code change (very few actually), and that this modi cation could not put the previous validation in danger. This has been made possible because the synchronized transition systems were an unifying tool between both models.
The actual computation of RMA values has been achieved using Excel 5.0 in its iterative mode.
This work is released in 10] and 1].
Formal Methods are usually thought as code validation tools after detailed conception, or less often as an architecture validation tool at conception time, but their investigative power is supposed to be irrelevant with the testing step of the development: they are argued as shorting down this last step, but without any direct involvment in it.
The experiment we present here is a counter{example of this narrow way of seeing, as it emphazises the role of the investigative power of Formal Methods even in the testing phase.
This case refers to some lowlevel rare mechanism driving a signal multiplexer. At a given time t, one among N analog signals is input in the measuring software, by the mean of a 1-in-n multiplexer, driven by an external (external to the software) counter. So we have two counters referring to the same value: an external (hardware) counter, selecting the input signal, an internal (software) counter, processing the input signal, both of them being controled by the same clock. The rst problem we have to deal with is that parasites can impact the links between the common clock and both devices, and thus desynchronize both counters, making the software process erroneously input values.
This can be handled either by identifying each sample transmitted to the software or by periodically synchronizing both counters. The rst solution was not relevant for technical reasons, and it has been designed that an output port would be used by the software counter to reset the hardware counter.
In such an embedded system as the electricity meter, various processes have to be activated to ensure the ability of the apparatus. One of them is the OUCABO system. This system requires a device to be periodically activated and tested. The second problem is that the solution to rst problem left no output port available to activate a secondary device required for ability reasons, nor input port to test it. This situation has been handled using the leftmost carry bit of the hardware counter to activate the device : if the software counter does not reset the hardware counter in due time, an arithmetic over ow occurs and the device is activated; this had been conceived and implemented ignoring Formal Methods, as an electronic hack to answer a very simple, low-level problem.
This part of the system had been developped and implemented, and tested using laboratory testing devices to simultaneously measure same signals. Unfortunately, it happened that a very seldom measure di erence occurred between testing and tested devices; at rst glance, the origin could be hardware, software or a testing protocol error.
Testing such rare behaviours may last a long time, and in order to save time, a model-checking approach was suggested. Then the system has been modeled using synchronized transition systems. The error and the error rate could be con rmed on a study of synchronized transitions, thus identifying the source of the erroneous behaviour. The bug was xed in the model and after validation, the system could be e ciently and safely corrected.
Notice however we did not intend to model and verify an implementation: we speci ed a posteriori the system by abstracting relevant elements with respect to the misbehaviour.
This work is released in 10] and 4].
Conclusion
This study has been an opportunity to test several cases of use of Formal Methods in a real industrial project, all along the project. From design to test, through detailed design and architecture, the synchronized transition systems, standing for a Formal algebraic framework, played an active and e cient role in the development of the product. In such a matter numbers are not easy to exhibit, for con dence reasons; we estimate the extra manpower e ort at Schlumberger dedicated to Formal Methods to about two percents of the global (hardware and software) development load. The academic companion e ort is about a permanent half-time, strengthened for several de ned periods by up to ve searchers.
The return on this extra e ort is manifold: rst, the cost has been kept in the original gure; second the development has been more fully under control, and the integration and metrologic testings have been passed throughoutly, third the quality of the product (the systems components, not the meter) allowed easy reuse in several variants by di erent teams.
For the LaBRI this project was an illustration of the bene t of a Formal approach for software engineering, and an opportunity to join two Formal Methods (Synchronized Transition Systems and Rate Monotonic Analysis) to cover, in some cases, temporal validation.
