A codesign methodology is proposed which is suitable for control-dominated systems but can also be extended to more complex ones. Its main purpose is to optimize the trade-off between hardware performance and software reprogrammability and reconfigurability. The methodology proposed intends to cover the development of the whole system. It deals in greater detail with the steps that can be made without the need for any particular assumption regarding the target architecture. These steps concern splitting up the specification of the system into a set of individually synthesizable elements, and then grouping them for the subsequent mapping stage. In order to decrease the complexity of each partitioning attempt, a two step algorithm is proposed, thus permitting a wide exploration of possible solutions. The methodology is based on the TTL language, an extension of the T-LOTOS Formal Description Technique which provides a large amount of operators as well as a formal basis. Finally, an example pointing out the complete design cycle, excepting the allocation stage is provided.
INTRODUCTION
The design of complex systems comprising hardware and software elements is of considerable interest on account of its extremely varied applications, thanks to the current availability of low-cost hardware devices. It is of fundamental importance to optimize both the cost and the performance of such systems; various studies have been carried out on this kind of design, which is *Corresponding author. commonly called codesign. Codesign is an approach to the development of systems composed by both hardware and software modules [1, 2] . Its main purpose is to optimize the trade-off between hardware performance andsoftware reprogrammability and reconfigurability. Moreover the aim of codesign is to be able to design a whole system without excessive preliminary constraints on mapping the module onto hardware and software parts [3, 4] . 401 At present the sector which seems to offer most prospects of codesign methodology application is that of embedded control-dominated systems, thanks to their low complexity. In these systems output signals are caused directly by input signals, which generally means that the systems do not require extremely complex processing of the input signals.
Embedded systems are often used in life-critical situations, where reliability and safety are more important criteria than performance. For this reason we believe that the design approach should be based on the use of a formal model to describe the behaviour of the system before a decision on its implementation is taken.
In this paper we propose a codesign methodology which is not only suitable for the abovementioned systems, but can also be extended to more complex ones. The systems to which our codesign methodology is applied are controldominated ones.
In order to achieve the final partitioning it is necessary to define the processor, hardware components and interfaces generally referred to as the target architecture. The methodology proposed currently refers to an architecture including a single general-purpose processor and a few application-specific hardware components (ASIC or FPGA), a single-bus master software component and a single-level memory hierarchy [5] [6] [7] .
As the methodology proposed intends to cover the development of the whole system, that is, from the specifications in terms of both time and behaviour to implementation of its components (the software components by using a programming language, the hardware ones by synthesis) certain choices have to be made, especially that of the technique used to .describe the system.
The language used for specification of the system is TTL [8] (Templated T-LOTOS), an extension of T-LOTOS [9] specially developed for use in codesign. TTL is also a valid tool for the subsequent stages of development, on account of its formal bases and the operators it provides. TTL allows consistency to be tested using mathematical properties instead of simulation approaches; in this sense the methodology is said to be formal. The issues this paper deals with in greater detail are the way in which the specification of the system is split up into a set of individually synthesizable elements, and the way in which they are grouped prior to the mapping stage. These choices are made without the need for any particular assumption regarding the target architecture. It will need to be chosen before mapping. However, the paper does not deal in detail with the problem of mapping, because it is possible to use most of the approaches in literature.
The final part of the paper presents a case study in order to evaluate the proposed design methodology.
The steps needed to go from specification to implementation are sketched in Section 2. Section 3 gives a brief description of the formal technique used in the design. Section 4 describes in more detail the process of specification and decomposition. Section 5 explains preclustering, which is a part of partitioning. Section 6 discusses implementation and Section 7 introduces a case study for the present methodology. Section 8 provides the authors' conclusions.
AN OUTLINE OF CODESIGN METHODOLOGY
Figure outlines the main steps which go from specifications to implementation of the system. The first step in the methodology is, therefore, the development of the specifications, using TTL. The specification stage is followed by the splitting stage which is subdivided into two steps: refinement and decomposition. The first splitting step, called refinement, makes the description of the system less abstract, thus passing from specification of the requirements of the system (maximum abstraction) to a structured representation (minimum abstraction). The refinement step, in which specifications are made less abstract, in reality includes several cycles of subsequent refinement. The TTL language provides adequate support during the whole stage, thanks to its operators and formal basis.
The second splitting step, called decomposition, consists of dividing the specifications up into a set of elements (called tasks) which can be synthesized separately. Decomposition is based on syntactic and semantic specification characteristics, as discussed in greater detail in Subsection 4.2 (a similar approach to specification can be found in [10] ).
Thanks to this approach the computational complexity required is quite low.
At this stage, however, there are still no constraints on whether an element is to be implemented in hardware or software.
In the methodology proposed, the set of tasks obtained from the decomposition step undergoes two further stages which together perform partitioning. In the first (predustering) the number of tasks is reduced below a certain threshold (grouping them into so-called clusters) in [11, 12] , Petri nets [13] and high level languages [5, 14, 15, 16] . In this paper we use TTL (Templated T-LOTOS) as the specification language. It is an extension of T-LOTOS ( [9, 17] ) which is suitable for the codesign approach.
The main extensions TTL introduces to T-LOTOS are modularization (allowing, for example, the use of libraries), use of templates (allowing the definition of a generic process) and the introduction of an iterative construct (loop) [8] .
The The language does not allow rate constraints on actions to be specified directly. This is not a problem, however, as for practical purposes a rate delay can always be expressed as min/max or a fixed delay [18] .
TTL has been developed in such a way as to use all the existing tools for T-LOTOS (e.g., Lola [19] ). In fact it is possible to translate a TTL specification into T-LOTOS using only syntactical transformations. TTL can be supported by a set of graphic tools which allow the designer to specify the behaviour of the system in a simple, immediate, familiar way. A possible approach would be like the one followed in [20] , which illustrates a technique by which it is possible to go from specification of the system by time diagrams to a T-LOTOS specification; since TTL is a superset of T-LOTOS a similar tool can be built.
The language has two components: the first is the description of the behaviour of processes and their interaction, and is mainly based on the CCS [21] and CSP [22] models; the second is the description of the data structure and expressions, and is based on ACT ONE [23] , a language for the description of Abstract Data Types (ADTs).
The syntax of the most important TTL operators is summarized in Table I ; a complete description of TTL syntax and semantics can be found in [24] . 4 a system-level description and proceeds by splitting the system into increasingly smaller pieces, until it reaches a level at which the single pieces can either be constructed by combining library components or are described directly. Figure 2 shows the refinement process during the preliminary stages of codesign.
Level 0 coincides with top-level system specification: at this level it is preferable to describe the system in as abstract a way as possible. The next n steps go from the abstract description of the system to a concrete one: at each refinement step the functional blocks are split into more elementary ones, leaving the behaviour of the system unchanged. Consistency between the description of the system at level n and that at level n-1 is verifiable thanks to the formal base of the language. Traditionally the consistency between what is specified at level n and level n-1 was checked by simulation. The use .of a language like TTL supports a better approach to system specification. The aim of the refinement step is to obtain specifications which can be efficiently implemented and, at the same time, represent the same system as level 0.
The division into modules also requires definition of the signals that have to be exchanged among the modules, which are usually called internal signals.
Exploiting the formal basis of the language, TTL aids the designer throughout the refinement process, giving mathematical certainty that the descriptions in the various steps are consistent.
The modularity of TTL also makes it possible to implement and include library components which have already been tested and used. This plays an important role in this step of the methodology, as the replacement of certain blocks with library modules which have a hardware counterpart notably increases the efficiency of the system.
Decomposition
The aim of the system decomposition stage is to identify the main functional blocks of the system being developed. These blocks represent the bricks which will be used in the subsequent stages to perform partitioning. Decomposition consists of splitting up the TTL specifications until a set of tasks which can be synthesized separately is found.
The main aim of codesign methodologies is to identify the blocks which permit the trade-off between performance and manufacturing costs to be optimized. It is, however, not desirable at this stage to make choices that constrain when a block must be mapped onto hardware or software. This would reduce the degree of freedom in the partitioning phase and therefore the possibility of obtaining a near-optimal system. In addition, making implementation choices at this stage would reduce the possibility of re-designing the system. Therefore to obtain the maximum independence from the final implementation, the decomposition criteria used must not involve choices which depend on the target architecture and considerations concerning implementation. On the basis of the considerations made so far, therefore, the data used must be obtained from the characteristics of the specifications alone.
Given the features of TTL there are two possible alternatives for the choice of parts we consider to be elementary.
Considering the single TTL constructs to be elementary, i.e., considering the operators which make up the behavioural expressions (external offer, choice, etc.).
Considering the processes to be elementary.
The first hypothesis can be discarded straight away as it would lead to an excessively high number of tasks, thus introducing too high a degree of complexity and fragmentation. The second is more plausible, also in view of subsequent translation from TTL into the language which will be used to implement the system. Due the tool currently used to translate the specification into synthesizable languages, if other processes are instanced inside a given process they have to be part of it. This hypothesis can be discarded using an ad hoc developed TTL synthesizer or some other translator which also accepts generic processes.
The decomposition process starts from the main specification and decomposes it according to the parallelism between the various processes. Figures  3a and 4a give some examples of decomposition into tasks. In the first example three tasks are obtained as they instance no other processes and are each parallel with the other two. In the second example the result of the decomposition process is two tasks, as process P2 instances P3 and so they constitute a single atomic element.
Decomposition can be performed automatically by means of a recursive algorithm which applies the considerations made previously.
The starting point of the decomposition algorithm is the tree which represents the hierarchy of nopara, which means the opposite of para. The attribute nopara is also assigned to processes which instance themselves (like process P1 in Fig. 3 ).
Figure 5 shows an example of a tree of processes, where the single nodes are marked with the relative attribute. The algorithm for the decomposition into tasks is described in Figure 6 (the function do_task is described in C-like language).
In the algorithm, attrib(node) indicates the function which returns the value of the node attribute, 
Pre-clustering
The aim of preclustering is to reduce the number of tasks to be partitioned with the purpose of reducing the complexity of the problem of partitioning.
The number of "sets of tasks" (which will be called clusters) generated by preclustering is obviously of .critical importance for mapping. If this number is too high the complexity of the problem is not significantly reduced; whereas if it is too low, the mapping will not achieve a good cost-performance trade-off. This is due to the fact that the only stage where delays and manufacturing costs are taken into account is mapping.
A lower computational cost would suggest executing preclustering until is possible to reach such a low number of clusters that they will be allocated without making any choices in the mapping stage. On. the other hand, the greater number of parameters taken into account during the mapping stage would suggest giving it as many optimization chances as possible by providing a large number of clusters. The best solution is probably a compromise between the two strategies. The most suitable number of clusters that preclustering has to provide the mapping with is very difficult to establish a priori and up to now our methodology has proceeded by trial and error.
However, we are working on partially automating this choice, basing it on data collected during previous design cycles and interactions with designers.
The preclustering algorithm adopted to group tasks attempts to minimize the coupling degree among the tasks defined as the "number of interactions between two tasks". We believe the coupling degree is critical for implementation of the final device, mainly because the higher it is, the higher the communication will be, which increases the cost connected with interfaces. The preclustering stage works on the system before the choice of target architecture, so it is not possible to know.the manufacturing cost or the delay cost. The coupling degree, instead, can be evaluated and it appears to be a valid heuristic method to reduce the complexity of problems: in fact, by reducing, the coupling degree tasks with higher interactions will be grouped together and will be mapped on the same partitions (either software or hardware).
On the basis of the tasks output by the decomposition process, the preclustering algorithm constructs a weighted (with respect to the coupling degree) graph of the various tasks and works on this to group them into separate clusters. (2) and task (3) as they are the ones which interact the most. Figure 9 shows the preclustering algorithm written using a C-like syntax.
When execution terminates, the set C will contain the n clusters which minimize the total coupling degree function, defined as: Figure 10 shows the various steps of the algorithm when applied to the simple example in Figure 7 , with n 2. Figure 11, Figure 12 , which only reproduces the portion of the graph we are interested in. On the basis of the algorithm illustrated above, task (2) could be clustered with either (1) lower number of tasks for each. This modification, along with some others, has not been shown for the sake of simplicity but they are implemented in the working program. Having smaller clusters means it is easier to explore hw/sw trade-offs and consequently obtain a better final solution.
Mapping
This is the stage where the various clusters output by the preclustering stage are classified as hardware or software and allocated to the target architecture. The purpose of this stage is to allocate each module either to software or hardware trying to maximize the performance of the system and minimize the cost (in terms of money) of manufacturing. To achieve this result the system must find the best allocation for each module. The mapping is influenced by the target architecture chosen because it imposes requirements on the dimension (of the hardware part, memory available, etc.) and on the interlaces between hardware and software. Moreover, the scheduling algorithm has a strong impact on the performance of the system [25, 26] .
The methodology has been devised in such a way as to leave a wide choice of partitioning methods. The mapping problem is not addressed by this paper; some interesting strategies can be found in [27] and [28] , each one can easily be integrated in our methodology and benefits from the reduction in the number of input tasks. proposed, we present a system to control the production of electricity in a hydroelectric plant. The aim of the example is to show the applicability of the method to quite complex real systems.
Specifications
The controller essentially has two functions: it has to check the level of the reservoir to make sure it does not exceed a certain limit, and then directly control the production of electrical power.
The system provides for two functioning modes, manual and automatic. In the first mode the parameters involved in power production are supplied manually from the outside, while in the second mode everything is controlled automatically by a daily production program.
The controller, presented comprises several blocks: the clock, the daily program, the control panel, the regulator and a set of actuators. Figure 13 shows the structural interconnection between the various blocks, which we reached after performing several refinement steps on the abstract specifications of the system. Figure 14 shows the main TTL specification of the system.
In giving a detailed description of the features of the individual blocks, we will make use of the modularity offered by TTL. The processes B1, B2 and B3 perform a sort of logical OR on the input signals, so as to guarantee correct functioning both in the automatic mode and during the transition from manual to automatic. Figure 17 gives a definition of these three processes.
Daily Program This block manages the daily automatic production of electricity. Figure 18 shows the declaration of the module.
The Daily Program module is a parallel combination of two processes, DP1 and DP2 (see Fig. 18 ); the first turns the plant on and off, while On the basis of the time signal, the process DP1 (see Fig. 18 for a definition) turns the plant on and off. It is turned on at 6.00 am (by emitting the prgstart signal) and turned off at 9.00 pm (by emitting the prgstop signal).
The process DP2 (see Fig. 18 ) uses the time signal to regulate the level of production of electricity. It acts indirectly on the aperture of the valve (signal prgwidth: 0 completely closed, 10 completely open) which regulates the flow of water into the power plant. According to the daily requirements, production is divided into three time bands:
From 9.00 pm on the previous day to 5.00 am on the next day, aperture 0, corresponding to no production of electricity; From 6.00 am to 6.00 pm, 70% of maximum production;
From 7.00 pm to 8.00 pm 50% of maximum production.
Clock Automatic management of production requires knowledge of the real time, which is provided by the block called Clock. Figure 19 shows the statement of the module and definition of the main process. As can be seen, the Clock block has been decomposed into a parallel combination of two processes counter and ck. For the purpose of automatically managing the production of electricity, it is sufficient for the time signal to be emitted every hour. We therefore implemented the counter process (see Fig. 19 The processes R1 and R2 function in a similar way (see Fig. 21 ). Figure 24 where definition of the main process is also given. Act3, as said above, serves as an interface with a stepper motor which can move by steps towards increasing or decreasing angles, according to whether a signal up or dw is sent. The aim of Act3 is to send as many up (or dw) signals as the steps supplied by fwidth. Figure 24 gives the declaration of the module and a definition of the main process.
Decomposition
The specification of the system is given in such a way as to obtain the maximum number of tasks in the decomposition phase. Figure 25 shows the tree which represents the hierarchy of TTL processes on the basis of how they are instanced.
Applying the decomposition algorithm, we obtain the following tasks:
T B1, T2 B2, T3 B3, T4 CNTRL,  T5 Actl, T6 R1, T7 R2, T8 R3,  T9 R4, T0 Act2, T DP1, T2 DP2 Among the other languages used for co-specification we can cite two examples: Cx, the entry language for COSYMA [33] , which extends ANSI C with delays, tasks and task communication, and Hardware C [34] Which can be translated into a flow graph.
In the methodology introduced in this paper the specification language used is TTL. It is derived from T-LOTOS, an FDT based on the CCS and CSP process algebras. TTL appears suitable for describing control-dominated systems, as discussed throughout the paper.
As shown in the paper a key problem in codesign methodologies is the validation of the model of the system being developed. Simulation is still the main tool used for this purpose and consists of comparing the model against a set of specifications. Many methods have been proposed in literature, they differ in their method of coupling hardware and software components. For example, in [35] a single custom simulator is used for both hardware and software, whereas another approach proposes using a software process running on a host computer loosely connected with a hardware simulator [36] .
TTL aims to perform verification on the specification. Formal verification is the process of checking that the behaviour of the system satisfies a given property, also described using a formal method. This approach has been widely adopted to verify the correctness of protocols and it appears useful in hardware/software property checking. It also allows the congruence between two successive refinement steps to be checked without using a simulation approach. For these reasons we refer to our methodology as a "formal codesign methodology".
Several solutions to the partitioning problem are proposed in literature. Some use a graph model to represent the operations performed by devices and associate a cost to them [33] . Others perform the partitioning together with the implementation of the scheduling algorithm as, for instance, in [29] where the specification is made with a hardware description language and synthesis tools are used to estimate the costs. The basic idea of performing scheduling and partitioning together is to minimize the response time.
Our methodology divides the partitioning stage into two steps. The first (preclustering) is based only on the properties of the system and aims to reduce the complexity of problems. This is obtained by a simple algorithm whose complexity is Very low especially compared with that of the mapping algorithm. The second step groups the remaining clusters and maps onto the target architecture. The strategy used to reduce the complexity of mapping is based on minimization of the interaction among clusters.
Finally some problems dealing with mapping have been discussed, including the choice of the scheduling algorithms needed to allow hardware and software modules to coexist. Proper choice of the scheduling algorithm is, however, an open problem to which further studies must be devoted.
