The present paper shows a system design methodology allowing the designer to specify, analyze and explore different hardware/software architectural solutions. To show the effectiveness of the methodology currently implemented in TOSCA (a co-design prototype environment developed at CEFRIEL) the redesign of an industrial telecom controller of 16 links has been carried out. This experiment covered in particular the higher levels of the system design, where functional cosimulation, architecture selection and hw/sw partitioning is performed. The final analysis revealed that the proposed co-design methodology produces a significant improvement in the design time, even considering the lack of experience with the EDA tools provided to support the design space exploration phase.
Introduction
The concept of system design, or co-design, includes a variety of possible definitions according to the considered relevant aspects, the application field and the system granularity of the analysis [1] [2] . The novelty of co-design with respect to the design of pure hardware and software, which are well known subjects, arises from the tight integration between the two types of design and from the global scope of the design constraints. Since such applications strive for high volumes, there is a pay-off for size, power and speed optimization techniques [3] . Within this scenario it is hard to identify the optimal solutions due to the difficulty of managing the complexity of exploring different design alternatives in a reasonable time. Therefore, designers are frequently trapped in conservative approaches aiming at minimizing the design effort, guaranteeing a running system, even if the realization is far from an effective solution, which can be a mix of hardware and software. In the past few years, a number of institutions started to investigate specific aspects of co-design, and currently some prototype methodologies and EDA environments are becoming available within the research community and even commercially( [4] , [5] , [6] , [7] , [8] , [9] and [10] ). However, the availability of a general widely applicable co-design methodology is still a long way to come, and each approach is usually confined to an application domain or a specific aspect of co-design. In our case we considered telecom applications, such as public switching and networking systems, where the function to be implemented is well known in advance and often standardized. Notwithstanding this, specifications change quite often during the design phase, thus introducing the need of increasing as much as possible the flexibility of the solutions. In this way it is possible to avoid complete re-designs in the attempt to keep the pace with the changed environmental constraints or changed standardized requirements which could not be foreseen at the beginning of the design.
According to the particular data on which the design is focused, systems can be roughly classified into data dominated ones and control dominated ones. In many cases however this distinction is purely accademic, because the same device can host both DSP modules (computation extensive) and sophisticated control parts with various layers of protocol handling. Our approach focuses mainly on the control-dominated applications or the control parts of those including also a complex data path. The common technology for these cost-sensitive applications is based on the use of some dedicated hardware units (e.g., ASICs or macro-components) tightly coupled with a low-end processor to leverage the hardware cost while maintaining a certain flexibility of the design along with the entire implementation process. The embedded products can take full advantage by the use of VLSI technologies implementing the entire system on chip, including a processor core, a significant amount of memory, I/O interface capability and a customizable ASIC part [10] . The software, directly written in assembly language or C, is frequently built around lightweight operating systems and its size ranges between a few Kbytes and some hundred Kbytes. The emerging needs point all towards the increasing importance of tuning design methodologies able to manage the hardware-software development starting from the architectural level. One of the most important added value of this research, that is the focus of this paper, is the possibility of coping with the early evaluation of different design alternatives without requiring the accomplishment of the complete synthesis cycle. To deal with the complexity of an exhaustive exploration of design alternatives, some heuristics have been introduced to keep the management of the design process at system level as simple as possible. A particular attention has to be paid to the setting up of estimation strategies to predict the behavior and cost of the final implementation in order to reduce the number of complete synthesis cycles. Our approach started with the exploitation of a co-design EDA prototype developed at CEFRIEL, called TOSCA [5] . The prototype was debugged and tested during the Best Practice Project SEED [11] . In its initial development stage the prototype focuses on the definition of the system requirements and specifications and implements their analysis by considering both static and dynamic properties, in order to determine the best hardware/software partitioning (as outlined in section 2). The tool and underlying methodology was applied to an industrial benchmark developed by Italtel. The design space exploration for this benchmark, the obtained results and the comparison with the same design done according to a traditional approach is described in detail in sections 3 and 4. Concluding remarks, showing the impact of the different design phases onto the global system-level design activities, are reported in section 5.
The TOSCA design flow
We consider a class of embedded systems dedicated to a specific application built around a programmable instruction-set processor onto which runs a fixed software program, that interacts with some dedicated hardware modules, which in turn can interact with the external environment. These systems are usually characterized by a reactive behavior to the environment stimuli, also conditioned by real-time constraints, sometimes requiring a high-performance non-conventional design of the software. The TOSCA co-design flow aims at supporting the designer throughout all the design activities, by assisting the user in refining the design from the system-level representation down to its implementation while concurrently fulfilling the project requirements and goals. The most relevant aspects of the system design strategy defined can be summarized as follows:
• the target applications are control-dominated systems, typical, for instance, in the telecom switching systems domain;
• the set of tools defined and implemented aim at providing a single development environment, thus providing the user with a common user interface, although all tools can be separately managed and tuned;
• the target architecture is constituted at this time by a single embedded processor connected to a set of hardware units, called co-processors, that perform the operations bound to hardware implementation;
• a single internal model has been specified which maintains all the information required by the different tools to perform simulation, validation, partitioning scheduling and resources allocation. A particular aspect of such a co-design flow including system specification and design space exploration (seed Figure 1) , has been the focus of the SEED project. Figure 1 . The relation between the SEED and TOSCA co-design environments.
System
Let us briefly describe the single components of the high-level co-design methodology: system and constraints specifications, design validation and design exploration.
System specification
A new system design can be specified and managed throughout all the design phases by using the graphical/textual editor developed. This tool, called the Exploration Manager, represents the unique platform that allows interaction between the different tools of the co-design flow (screen dumps are reported in figure 3 and figure 10 ). The user can introduce the specification by graphically describing the system architecture, by defining modules that compose the system and the environment and their interconnections, then hierarchically specifying each single module. Finally, the leaves of the specification are described in a formal language, OCCAM2, through a textual editor. OCCAM2 allows the representation of both parallel and sequential execution of processes at any abstraction level. In general, the description of the overall digital system can be viewed as a collection of OCCAM2 processes corresponding to different modules of the description, which can be a single OCCAM2 procedure (that is the lowest level of granularity we assumed, for any co-design activity). Communication and synchronization between concurrent processes is obtained through blocking channels. Timing constraints cannot be expressed considering the standard OCCAM2 constructs. To include this important feature in our system-specification editor, we have extended the OCCAM2 syntax to describe the following cases:
• stimulus-stimulus: maximum admissible delay between two stimuli;
• stimulus-reaction: maximum delay between the stimulus and the corresponding system reaction;
• reaction-stimulus: maximum insensitivity time among a system reaction and the following stimulus;
• reaction-reaction: maximum time interval between two system reactions.
The OCCAM2 code contains some labels (tags) anchored to specific points within the specification and a constraint definition section grouping all the constraints defined over the labels. A screen dump of the timing constraints editor is shown in figure 10 . The entire specification is converted into OCCAM2 code, stored in an object oriented design database, that will be accessed by all tools managing and modifying the specification for design space exploration.
The TOSCA Co-Design Activities
Once the model has been realized, a simulation session is foreseen, consisting of two main tasks: verification of the OCCAM2 code correctness and verification that the system description does correctly comply to the system requirements. Functional validation is performed through built-in OCCAM2 simulator, which allows also timing validation and the evaluation of dynamic metrics useful for system restructuring and hw/sw partitioning. After functional validation, the system must be analyzed in terms of the given constraints defined by the user in order to explore the different solutions available in terms of hardware and software partitioning. This task is performed through the help of a proper set of metrics. Static metrics provide information on the system specification structure, thus guiding the user in the application of transformations. Dynamic metrics allow the identification of those solutions that satisfy the user's constraints, thus driving the system restructuring and optimization. Static metrics belong to three different classes: quality metrics, connectivity metrics and software metrics. Quality metrics estimate the systems costs in terms of area for the hardware-bound parts, and memory size for the software modules. Furthermore, they allow a first evaluation of the performance in terms of clock-cycles necessary for the execution. Connectivity metrics analyze all aspects related to communication. In particular, four metrics are defined:
• the connectivity metric, that computes the number of interconnections between two modules; • the communication metric, that represents the amount of data that could be potentially transferred between two modules; • the common access metric, that expresses the number of resources;
• communication constraints metric, that defines the amount of data exchanged between two modules under constraint.
The software metrics we selected analyze the properties of the OCCAM2 code. Different metrics are considered: the number of declarations of each module, the dependencies among modules in terms of imported and exported procedures, the cohesion, i.e. the existing relation between procedures and data declarations, the intensity of coupling between modules, and finally the number of paths in the procedures. Dynamic metrics, are obtained by the simulator and provide information on the functions of the modules and on how the different parts of the system are differently exercised (e.g. process latency, execution frequency, computational workload, …). In fact, one of the typical goals is to optimize the exploitation of all the system components.
The results obtained can be used in two manners. The metrics allow the evaluation of the model obtained during the specification phase and can be used to drive the partitioning process (both modularization and hw/sw binding). TOSCA also provides the user with the opportunity of modifying the specification through a user-driven automated application of a set of formal transformations that guarantee by construction the correctness of the modified specification. Basically they work on: simplification of the structure of the process tree, operator composition, reduction of the communication among modules and remodeling of the timing constraints. The designer should concentrate his effort towards the balancing of modules complexity, balancing the depth of each of them and by splitting/collapsing procedures if it is important to improve cohesion (i.e. the obtaining of a set of fairly independent set of modules). The results of these analyses provide a set of measures that represent the starting point for the following phases of the co-design flow, and in particular for partitioning. In such a phase the actual assignment of parts of the specification to the hardware or software partitions is performed. The basic level of granularity is the OCCAM2 procedure. The partitioning task is driven by the results of the metrics evaluation, from simulation and from the design constraints associated with the entire design or with single procedures. The output of this phase is, for each OCCAM2 procedure, the definition of the partition it belongs to, such that all design constraints are satisfied both on the overall project and on the single procedures. After the selection of the partitioning the design activity proceeds towards hardware and software cosynthesis and code/architecture optimizations. These design phases are out of the scope of the paper, and therefore not presented here.
A co-design experience: The ILC16 commercial device by Italtel
The nowadays accumulated experience on software and hardware synthesis technologies (and CADrelated tools) allows prediction, with a good confidence, of the pertaining quote over the global turnaround time. Unfortunately, the impact of design space exploration in terms of development time and design effectiveness is a controversial issue. One of the goal of the SEED project, apart from the validation of a co-design methodology, has been to collect information concerning the costs of the topmost design activities, such as specification, validation and design space exploration, for which accurate predictive models are still an open issue. The selected benchmark is a real-world industrial device provided by Italtel, a control dominated system whose correctness is strongly dependent on timing constraints. Due to the necessity of balancing the evolution of standards with performance it is a natural candidate to a mixed hw/sw implementation.
the ILC16 structure
The benchmark (ILC16) is a link controller able to manage 16 synchronous/asynchronous links under the HDLC protocol. The original implementation is fully hardware and suffers of some limitations, such lack of flexibility to the traffic fluctuations and to any protocol modification. As shown in figure 2, the device is connected through a bus to an external CPU which is the top level controller for the entire communication system, a private memory for ILC16 (Esram), a global memory for the system (Ram) and some I/O data lines (Input/Output). In summary, the purpose of the ILC16 device is to implement the set of communication links based on the HDLC protocol, while managing the interaction with the rest of the system including the transfer of data to/from the Ram. The purpose of each link is to recognize the HDLC protocol and to transfer data between the ILC16 and the memory system. The control part of the system configures the links and verifies the correct behavior of the system and the interaction of ILC16 with the surrounding environment. Starting from the official textual specification provided by Italtel, a model of the system has been created by using the Exploration Manager of TOSCA (a screen dump is reported in figure 3 ). By using some of the formal transformations, the system has been modified to increase the granularity of modules, to reduce the inter-module data exchange and to accommodate the flexibility requirements. The final description of the system, constituting the basis for the following hw/sw Exploration, is composed of 800+1200*N OCCAM2 lines where N is the number of links. Five basic macro-modules has been identified (see figure 4 ):
• Buffer Description Manager, i.e. a data transfer system to move data from/to the Esram, possibly via Dma and to interface the BUS.
• Risc, which is a microcontroller managing the links and the interaction with the environment.
• A link Interface, connected to the set of links, to manage the HDLC frames.
• A set of 16 Rx/Tx Links.
• The ILC16 External Bus Interface. Let us shortly describe the specification of the five macro-modules.
The system transmits and receives data according to the semantic defined by the HDLC protocol. The Reception link interprets the received data and transmits them to the Dma module which manages the allocation of the data into the system memory. The link structure, represented in figure 5 , is composed of two sub-modules: the Hdlc module and the Dma modules. These modules work at different speed; thus, a buffer memory is necessary to synchronize the data exchange. Since the Reception link receives a sequence of bits, which must then be manipulated as bytes, the Hdlc module is partitioned in two separate modules: the HdlcRx and the BitToByte. This last block transforms the sequence of bits into a flow of bytes. The HdlcRx module receives a flow of bytes and, after the identifcation of a complete frame, it computes the CRC and detects the frame condition (Abort, Correct, Incorrect, Non Aligned). This information is sent to the Dma module through the Fifo module, which stores it into the memory (Ram). Most of these operations are computed under the supervision of the Risc module which receives and transmits the memory addresses, reports the end of the frame and generates all control signals related to the frame conditions.
The functionality of the Transmission link is dual with respect to the Reception one: during the transmission, the channel restores the data from Ram (DmaTx module) and transmits them to the HdlcTx; the HdlcTx module produces the frames following the protocol format. As in the previous case, a memory buffer (FifoTx) is used to uncouple the DmaToFifo and the HdlcTx. Since data have to be transmitted according with a given frequency, a Sincro module is connected to the transmission sequence. Furthermore, since each byte of data is encoded using the parity bit code, the BDM module computes the parity bit and verifies that each read byte has been correctly resumed (in case of error an interrupt is generated). The Link Interface module constitutes the interface between the control system and the links. This module has two functions:
Transmission Link Dma Hdlc

DmaToFifo
• a multiplexing of the control signals directed to the links;
• a filtering of the concurrent channels signals directed to the control modules.
In particular, the Link Interface module has to manage the signals for the bus control, the selection of the transmission channel speed, the memory addresses, the error signals,.... The Bus Interface module, which represents the interface between the ILC16 and the external Bus, converts the ILC16 signals into the system bus format and manages the concurrency. To better verify the functionalities expressed through the specification of the ILC16, its interaction with the rest of the system has been also taken into account. In fact, this interaction is actually constrained by a set of functional and timing conditions. Such conditions are associated with 6 modules: the Ram Memory, the Input and Output, the External Mpu, the Bus Arbiter and the Interruput Controller 
Timing Characterization
Given the overall model of the specification, it has been necessary to identify those constraints that the system must satisfy in addition to the purely functional correctness:
• The device must perform the operations of sending and receiving with a rate fixed by the HDLC communication protocol.
• During the normal operations the FifoRx device must not overflow and the FifoTx device must be not empty; these situations are exceptions to be avoided.
• The links controller must provide as fast as possible an answer to the requests coming from the links; the speed of the answers is related to the compliance of the first requirement.
• The device must be as versatile as possible; a software implementation of the device functionality is recommended. These textual requirements must be translated into formal constraints that have to be applied to the functional model (i.e. the OCCAM2 description). Two classes of timing constraints have been identified: hard and soft constraints. Hard constraints represent a timing characterization that has to be satisfied to obtain a correct functionality of the device. On the contrary, a soft constraint indicates that a violation of such a timing specification does not determine a functional failure but only a performance degradation. Starting from the original specification documents, the timing constraints reported in Table 1 have been identified. The constraints related with the transmission frequency refer to the communication between the modules ByteTobit and Sincro for the links. The correct behavior of the ILC16 transmitting links is based on the condition that the Sincro module is slower than the rest of the transmitting link (Sincro is the bottleneck of the transmission). On the contrary, the reception frequency constraint is applied to the output communication process of the Sincro connected to the BitTobyte; this constraint ensures the correct working of the receiving links. The latency between Bus request and Bus grant, the Reading/Writing delay inside the Esram and the latency between a memory address request and its availability constraints translate the last design requirement of achieving a fast answer to the request from DMA. Constraints concerning the number of over-run events and the number of under-run events control that no overrun error or under-run error appears in the Fifo. All the relevant timing-related issues have been introduced in the design though the graphical editor by inserting tags in the description and by specifying the timing constraints into an overall table (an example is shown in figure 10 ). 
Design space exploration
The first functional specification has been initially statically manipulated in order to better balance size and structure of the over 40 modules composing the OCCAM2 specification. In particular, the considered parameters for each module are the number of lines, the number of paths and the maximum and average depth. The cohesion in terms of potential data exchange among the modules has been also evaluated to find out candidate modules for possible collapsing. At the end of this phase, the system has been considered ready for the evaluation of alternative hw/sw partitioning. The design scenario considered depends on the following constraints and parameters: As reported in Table 1 , soft and hard constraints have to be considered. The hard ones are data rates. Soft constraints concerns the bus request/grant, Dma access to the memory and Dma request. Each valid solution must satisfy all the hard constraints, while the soft constraints typically are used to select a solution among the acceptable ones. Globally the system is composed of 60 procedures: 44 constitute the ILC16 while the rest is necessary to model the environment. Two border solutions for the ILC16 procedures were analyzed in order to set the bounds on the design space: 1. fully hardware implementation; 2. fully software implementation (note that this solution has been considered only for the sake of completeness, to get a comparison term). None of them is acceptable, since the second solution does not meet all the timing constraints imposed on the system while the fully hardware solution satisfies all the constraints but does not cope with others design requirements (such as implementation cost, flexibility, …). Thus, an intermediate partitioning has to be performed to produce an optimal and more flexible solution. Under a theoretical point of view, each of the 44 modules could be realized either hardware or software, so that the associated design space is rather large, spanning over 2 44 = 16 Tera different solutions. To cope with the complexity of analyzing such a wide design space, the expertise of the designer and the use of the high-level co-simulation have been of paramount importance to shrink the design space to an affordable size. The following basic criteria have been considered to reduce the solution tree:
• It is possible to take advantage of the symmetry characteristics of the ILC16 system. In particular, the device has two identical communication links and it is reasonable to implement both in the same manner (the solutions space is reduced to 2 26 =64 Mega solutions).
• Each link is constituted by a receiving section and a transmitting section. These two parts are exactly equivalent and are realized using the same physical partitions (the solutions space is reduced to 2 17 =128 K solutions).
• Information coming from connectivity dynamic metrics suggested that the data flow through the links is larger than the data flow between the links and the control system. The results of these dynamic metrics analysis, performed using the OCCAM2 simulator, pointed out that it is possible to use different implementations for the links and the control system (typically the control system is implemented in software).
• According to the designer's suggestions, the OCCAM2 modules implementing the Dma (Tx and Rx), the SincroTx and the BusInterface should belong to the hardware partition.
• The ILC16 is an high-performance device. In order to keep this performance, only one module for each communication section is implemented in software.
In conclusion, the application of the above criteria (symmetry, cohesion and system functionality) pruned the design space to 32 relevant different solutions, being only five (named a-e) the macro modules still pending between hw or sw: control_system, link_interface, FIFO, HDLC and BitToByte/ByteToBit. Starting from the modules grouping so obtained, according to the design constraints above defined, three different scenarios have been considered to mimic real operative conditions. Each of them is associated with different technological characterizations. The first scenario considers 6 different characterizations where the hardware frequency is fixed to 50Mhz while the microprocessor frequency, link speed and the Fifo size are chosen in the range of {4Mhz, 10MHz, 25Mhz}, {128 Kbit/sec, 64Kbit/sec} and {32 Byte, 64 Byte}, respectively. Due to space limitation, only this case is here detailed. The analysis starts by considering an initial configuration where all modules (a-e) are hardware-bound and moving one module at a time into the sw partition: 32 possible solutions can be generated. Figure 11 reports the tree of the possible alternatives, where each box represents a new migration of the corresponding module in the sw implementation domain. For instance, the solution identified by the path ab correspond to a software implementation of the control_system and link_interface, while the rest of the device is hw. Since some of all the possible solutions require that more than one module composing the link is implemented as a software module (as set by the system functionality requirement), such configurations can be preventively eliminated (gray area of figure 11) ; thus, the design space size is reduced to 16 possible configurations. By performing the area estimation for all the ILC16 modules, and recalling that the upper bound for the area is set to 110000 gates for the hardware section and to 1100 equivalent gates for the software section, more solutions can be eliminated (gray boxes in Table 2 . Estimated area over the maximum constraint for the different partitions.
The OCCAM2 testbench related to the ILC16 device modules allows the simulation of a typical net traffic. The simulation foresees the receiving of 20 frames (14 data frames of approximately 180 bytes each and 6 control frames), the transmission of 15 frames approximately 200 bytes long each. For instance, the simulations represent the activity of the ILC16 device in a time interval of approximately 287 ms with a receiving/transmitting rate of 64 Kbit/sec. For all the simulations, the Motorola 68000 target microprocessor model has been used.
The simulator is quite efficient in terms of speed, for instance 16 ms of real working of the device takes about 1 sec of simulation time on a Ultra SPARC Solaris2.5. Figure 12 shows a graph that summarizes the most relevant valid solutions discovered using the clock frequency of the microprocessor or the data transmission/receiving rate of the device as design variables. A comparative analysis including also the soft timing constraints, revealed that the second solution is preferable since it is more flexible and less expensive. In general, we can observe that the microprocessor clock strongly affects the possibility of implementing part of the system in software while meeting timing requirements. For the second scenario, the goal was to explore in detail the influence of the technology (in particular the microprocessor) on the satisfation of the timing constraints of a given partitioning. In particular, the speed of the channels is of 1024 Kbit/s and the processor frequency vary from 50Mhz (the lower limit for the chosen partition) up to 200Mhz. The results are summarized in figure 13. It has to be underlined that only the use of a 200MHz microprocessor can satisfy the required performance. This solution has been discarded by designers since, even if technologically feasible, it is too expensive with respect to the budget for this class of devices. Anyway, this analysis has been important to determine the class of microprocessor to be used and, in turn, the technology range of the overall system. The third and last scenario fixes the hw frequency (50Mhz), the microprocessor frequency (75MHz), links speed (512 Kbit/sec) and the Fifo (32 Byte) and explores the solutions space focusing only on the aspects related with the partitioning procedure. The simulations have been carried out similarly to scenario 1, and to compare alternative solutions we used the following criteria:
• sw and hw area;
• Amount of satisfied constraints on area and timing;
• Severity in violation of timing constraints (normalized sum of the system delays).
A weighted sum of the software and timing constraints violation has been used to select candidate solutions. However, the ultimate decision is left to the designer which has the responsibility to tune the weights of such a goal function, i.e. to decide the relative importance of speed and performace. Based on these results we derived the following achievements. The design space exploration based on the tool provided within the SEED project can be very useful to avoid wrong partitioning or unnecessary over-sizing (i.e., cost) of the physical parameters of the system. Moreover, it is very useful and profitable to know if the device can produce the required performance using a lower value of the microprocessor clock frequency. The application field for this kind of analysis covers a group of applications where the design requirements in terms of performance are not too stringent. The reason of this is essentially related to the fact that the timing estimations provided by the tool are based on a well-defined architecture and on a fixed mode of translation between the specification code in OCCAM2 and the software, and those, in general, could not be the most efficient implementation in all the classes of applications. Furthermore, let us underline that the tool adopts a very conservative approach to provide timing estimations, therefore a real device implementation will often produce better results.
Concluding remarks
The TOSCA design environment has been stressed by modeling a real device (called ILC16), commercialized by Italtel, used as test vehicle for the SEED Esprit project. The ILC16 component is a data-link controller for sixteen asynchronous/synchronous data streams based on the HDLC protocol. The original design allocates the management of the channels to a RISC CPU core cell, while the HDLC protocol processing is hardware-bound; both sections are embedded in the same chip. The system has been reverse engineered by using the TOSCA environment producing more than 4K OCCAM2 lines of code. The various components of the system have been initially allocated either to the hardware or software domains and then simulated to verify the functionality and the fulfillment of the timing constraints. Different scenarios have been explored before committing to the final hardware/software partitioning, to simulate the possible designer's needs and constraints: 1. Full freedom in deciding technology and implementation parameters; 2. Given a hw/sw configuration, it has been investigated the implementation technology allowing the fulfillment of all the timing constraints; 3. Given a technology, the focus has been on the selection of the optimal hw/sw partitioning. During the ILC16 reengineering, design-time information concerning the co-design process has been collected. Table 3 summarizes the allocation of the total manpower (excluding the synthesis stage) on the different design stages, by considering the initial effort to became familiar with the TOSCA environment. The effort distribution on the different activities related only to the design process (84% of the time reported in Table 3 ), is shown in figure 14 . These results are important to get a feedback on the effectiveness of the proposed methodology for design entry and design space exploration. In fact, our results have been compared also with the design process performed by the original ILC16 designers. The main difference is that the design space exploration was not performed at all, since the designer, under the time-to-market pressure, embraced the conservative approach to implement most of the system in hardware. Such a position, together with the absence of any system-level simulator, as the one embedded in the TOSCA environment, led the designers to analyze the system for functional debugging and implementation at the same level of abstraction, after the VHDL coding has been completed. This does not allow the separation of the two distinct phases of architectural validation and hardware description specification simulation, as it would be necessary in a correct design environment, thus requiring different iterations in the design process. Hence, we can only conclude that the activities of functional debugging, design space exploration and VHDL simulation in the original design required more or less times the manpower required by the codesign process here described. Therefore, the introduction of this type of co-design activities within industrial design flows is fully justified in terms of design time. Nevertheless, the main lesson learned during SEED project, is that the co-design activities should be still a user-driven process, where it is necessary to capitalize on the designer's experience (e.g. to allocate in software the evolving standards-related part of the specification). The tools can be valuable to explore and compare alternative solutions, when the number of modules managed is below a hundred. Moreover, since design specifications can quickly change during the design process, the design turnaround time is characterized by a twofold nature:
Activity
• long term designs (1-3 years) with a continuous upgrade of the specification (e.g. evolving standard with respect to an almost stable kernel), very sensitive to the cost of modification and development; • short term designs, (less than one year), that are rather implementation-cost sensitive, where fast exploration of alternative designs is desirable but difficult to achieve due to the time to market pressure. In both cases, system level activities such as co-design seem to be the only viable strategy to meet the increasing amount of design constraints. Currently, most of our effort is devoted to provide the environment with a proper analysis strategy, both for the hardware and the software domains, that includes the power consumption estimation during the earlier stages of the design as one of the relevant figures of merit to evaluate the system architectural alternatives. This requires the possibility of back-annotating the information obtained after the software
