This paper deals with the system-level performance analysis and optimization of a class of digital systems we call mixed asynchronous-synchronous systems. In such a system, each computation module is either synchronous or asynchronous. The communication between all the modules is assumed to be data-driven. In order to adequately describe the timing of such architectures, we introduce a graph model called MASS which is based on several extensions of timed marked graphs. The rst extension is that the node set V is partitioned into synchronous and asynchronous nodes. A synchronous node can only re at ticks of its local module clock. Based on these extensions, we analyze the behavior of MASS, in particular period, periodicity and maximal throughput rate.
Introduction
This paper is concerned with the performance analysis and optimization of a class of digital systems we call mixed asynchronous-synchronous systems. In such a system, communication and operation of all nodes is data-driven. However, each module may be either synchronous, i.e., synchronously clocked or asynchronous. The communication between all the modules is assumed to be data-driven.
Motivation
Today, architectures that consist of a combination of dedicated circuits, for example VLSI circuits, and general purpose components (memory blocks, A/D and D/A converters, DSPs, etc.) are becoming increasingly attractive and can be built at relatively low costs. Very often, they contain synchronous as well as asynchronous components.
Synchronously clocked circuit modules compute functions by separating stages of combinational logic with latches or registers that are clocked with a globally distributed clock. The advantages of synchronous design styles can be summarized as follows:
Maturity of existing CAD design tools, no circuit overhead due to handshaking or completion signal generation.
Asynchronous components do not employ a global clock for enforcing system activities. In case they use handshake signals to sequence operations, they are called selftimed 1]. Those modules are supposed to have the following properties:
Average-case instead of worst-case performance In a synchronous system, the maximal clock rate is determined by the slowest computation module. Delay-insensitivity The communication based on a selftimed handshake is reliable due to paradigm of delay-insensitivity 2, 3, 4]. Hence, physical design variations may have no in uence on the correctness of the circuit. Also, a better technology migration is possible. In synchronous systems, typical problems are clock skew and critical path estimation. Modeling the environment Especially in the case of embedded electronic systems (see, e.g., 5]), the behavior of the computation environment can very often be modelled using asynchronous computation modules, for example, in case of sensors, actuators, and mechanical components.
For a more complete summary of the advantages of asynchronous systems, and for an overview of the state of the art in design methodologies for asynchronous circuit design, we refer to the collection of excellent papers edited in 6, 7] . However, at a certain point in the design hierarchy, the communication costs for building a selftimed circuit (e.g., for dual-rail coding 1], di erential logic 8], handshake logic) are not justi ed any more by the possible gains.
As a consequence of this discussion, it is desirable to model and design systems which consist of communicating synchronous and asynchronous components. On the one hand, this may be the appropriate model for the (embedded) electronic system and its possibly asynchronous environment. On the other hand, there is an optimum ground for the structure of a system where we do not have a monolithic synchronous system, but a mixture of asynchronous and synchronous design styles. Some of these architectures are also referred to as globally asynchronous locally synchronous systems (GALS) ( 9] , 10]). For a discussion on advantages and disadvantages, see 11] . Aspects of describing such systems can be found in 1], 9], 12], and 10]. Whereas 1] and 9] describe implementation aspects when building globally asynchronous locally synchronous systems, the approach presented in 10] proposes a language oriented approach to the design of globally asynchronous locally synchronous systems. There, the goal is to derive a circuit from the high level language SIGNAL. Finally, the work described in 13] concentrates on synthesizing a synchronous nite state machine from a mixed synchronous/asynchronous state graph with the same behavior.
Example 1 Fig. 1 shows a typical example of a mixed asynchronous synchronous system. It is a disk drive application where a microcontroller (MC) controls a servo. The microcontroller transfers a position to the servo. As soon as the servo has reached its actual position, it issues a completion signal to the microcontroller. In the meantime, the microcontroller is busy with other tasks (e.g. has to serve other requests). We assume that the microcontroller polls every N cycles of its clock period T for the completion of the servo. 1 Obviously, the servo represents an asynchronous system: The servo may start and nish serving a request at time instances which are not multiples of the period N T of the microcontroller.
Goals
Unfortunately, none of the work described above proposes a model that enables the exact timing behavior of a system containing synchronous and asynchronous modules. Here, we are concerned with an exact analysis of performance, e.g., in determining the minimal period (or maximal throughput rate) achievable by such systems. Primarily, our goal is to generate a timing model that satis es the following requirements:
Simplicity and implementation independence In order to be amenable for CAD, the model should be simple and not focus on one particular implementation style or design methodology (e.g., circuit delay model The theory of mixed asynchronous-synchronous systems as introduced here combines aspects of asynchronous as well as synchronous performance analysis. In the realm of synchronous architecture design, Leiserson et al. ( 14] ) have developed a theory for analysis and optimization of signal ow graphs (SFG).
In the domain of asynchronous systems, graph models such as timed Petri net models have been investigated for analysis of performance 15, 16, 17] . In case of deterministic computation times, the corresponding Petri net is decision-free and can be represented by a marked graph 18], for a classi cation see e.g., 19].
In 20] and 16], it is shown that under certain conditions, systems modeled by marked graphs have an asymptotically periodic behavior and that the minimal period of such a system is given by the maximal cycle mean. A detailed analysis of this class of event systems is contained in 21].
Based on these results, we will introduce a graph model called MASS (mixed asynchronous-synchronous systems) which is an extended timed marked graph model. The rst extension is that the node set of a MASS is a partition of asynchronous and synchronous nodes. Also, in contrast to marked graphs in which a computation module can commence its operation if all incoming arcs contain valid data, a synchronous node in an MASS can only start or nish its computation at a tick of its local module clock.
Models of computation

Marked graphs and their unfolding
The main purpose of this section is to introduce some notation and to recall the computational model associated with marked graphs, because this is the basis for the extensions described in this paper. For To be more precise about the above model, e.g., with respect to the initial conditions and the properties of the collections, the concept of (event){unfolding is introduced which represents the set of all evolutions of a timed marked graph 2 Let R, Z stand for the set of real and integer numbers, respectively. and includes information on the corresponding scheduling. Most of the results described in this paper will be derived using the notion of unfolding. 3. to each arc in A the weight of the corresponding arc in G, i.e., h :
A node v i (k) of the unfolding represents the kth ring of node v i in the marked graph. An arc from v i (k) to v j (l) expresses the fact that the lth ring of node v j can take place only after the kth ring of node v i . The nodes v i (k) for k 0 represent the initial conditions of the marked graph, i.e., the placement of d ij tokens into the collection corresponding to arc (v i ; v j ).
The scheduling of a timed marked graph that has been introduced informally, can now be described more precisely. An admissible schedule is de ned as follows:
De nition 3 (Admissible Schedule) An admissible schedule function : V ! R 0 satis es:
Here, j (k) denotes the time when node v j res for the kth time. The rst condition in De nition 3 serves to consider the initial timing conditions of the token game. All initial tokens are placed into the collections at time 0.
An example explaining this model of computation is given next.
Example 3 Consider the marked graph shown on the left hand side of Fig. 2 again. The corresponding unfolding is shown in Fig. 3 De nition 
Using the de nition of the maximal cycle mean P cm of G with
where W contains all arcs in a directed cycle and C(G) contains all simple directed cycles of G the following statements hold: There are admissible 1-periodic schedules for all P P cm . A 1-periodic admissible schedule with P = P cm has minimal period, i.e., P cm = P min .
Proof: The detailed proof is given in 22]. The rst equation (2) (4) is admissible. The proof of the maximal cycle mean (3) uses the dual representation of (2).
P cm is the maximum mean of accumulated path weights h ij and number of tokens d ij for all directed cycles of G. Using the above theorem, the maximal{ rate schedule can also be determined by solving the linear program P cm = minfP : j ? i h ij ? Pd ij 8(v i ; v j ) 2 Ag In the following, we will analyze the performance of mixed asynchronous{ synchronous systems. But rst, we will introduce the required extensions to the timed marked graph model in order to adequately describe such systems.
Modeling of mixed asynchronous{synchronous systems
We assume that in a mixed asynchronous{synchronous system, a synchronous node can only send or receive data at its local clock ticks. The formal model of MASS to be de ned next is an extended marked graph model.
De nition 6 (MASS) A mixed asynchronous-synchronous system (MASS)
denotes an extended marked graph G = (V; A; d; h; r; p). The set of nodes V is partitioned into disjoint subsets V A and V S , corresponding to asynchronous and synchronous nodes, respectively. In addition, the function r : V ! N assigns a clock period, and the function p : V ! R assigns a clock phase 0 p i < r i to each node v i 2 V . In the unfolding (G), to each synchronous node v i (k) 2 V there is assigned the clock period r i and the clock phase p i .
The following example clari es the chosen representation of mixed asynchronous{synchronous systems.
Example 5 Again, the system described in Example 2 considered. Now, we suppose that node v 2 is a synchronous node with the local clock phase p 2 = 0:1 and clock period r 2 = 1. Hence, the clock period is normalized to 1. Fig. 4 shows the corresponding extended graph.
The major di erence to the asynchronous case is the token game played on the MASS. The time instances j (k) when a synchronous node can complete an operation is now constrained as follows:
This restriction is motivated by the fact that a synchronous module can deliver a value at its local clock ticks only. Therefore, the ring of the node is delayed until the next clock event. For asynchronous nodes, clock period and clock phases play no role. However, by letting r i = 1 and p i = 0 for all v i 2 V A , we can consolidate the description of the behaviour of synchronous and asynchronous nodes.
As a result, the de nitions of admissible schedules must be extended as follows:
De nition 7 (Admissible Schedule) An admissible schedule function of a mixed asynchronous{synchronous system is a function : V ! R 0 that satis es i (k) = i (k) + p i , and h ij = h ij + p j ? p i where 1. j Example 6 The same system as in Example 5 is considered. Its unfolding is shown in Fig. 5 . To the nodes, there are assigned the earliest possible ring times j (k). For example at node v 2 (2) it can be seen that the second ring of the synchronous node v 2 happens at time 2 (2) = 9:1 as 1 (2) + 3:5 = 8:5 has been rounded up to the next integer plus p 2 = 0:1.
Looking at the de nition of an unfolding, it is obvious that one possible maximal{rate schedule is obtained if a node res as soon as it is enabled. This fact is elaborated in the following theorem.
Theorem 2 (Free Schedule) The following conditions for the free schedule of a MASS are equivalent:
1. There is no admissible schedule with smaller ring times j (k). At rst, some bounds on the average period of the maximal{rate schedule determined above will be given. These bounds can be computed e ciently (in polynomial time). In particular, they can be related to the maximal cycle mean of marked graphs with appropriately chosen weights. ~ i (k) hold for all nodes v i 2 V , k 2 Z >0 , then P cm (Ĝ) P min P cm (G) holds as P min (Ĝ) = P cm (Ĝ) and P min (G) = P cm (G).
For all nodes without predecessor we have^ i (k) =~ i (k) = i (k) = 0. Consequently, the initial conditions for all unfoldings are identical with^ i (k) = i (k) =~ i (k).
Let us consider an arc (v i (k ? d ij ); v j (k)) in the unfoldings of G,Ĝ andG. We will show now that the inequalities implied by such an arc in G: Example 7 Consider the MASS on the left side of Fig. 6 with an asynchronous node v 1 and a synchronous node v 2 . In the middle, respectively on the right hand side of Fig. 6 , the associated marked graphsG andĜ are shown. The maximal cycle means of the associated marked graphs which determine the bounds for P min are P cm (G) = 3:25 and P cm (Ĝ) = 3:1, respectively. Without considering the unfolding of G, we can say according to Theorem 3 that 3:1 P min (G) 3:25. By the determination of a maximal{rate schedule for G using Theorem 2, we get P min (G) = 28=9. As a result of the above theorem, we have P cm (Ĝ) = P min = P cm (G) for the subclass of MASS that do not have arcs from asynchronous to synchronous nodes becauseĜ =G. Note that this subclass of MASS includes asynchronous systems (V = V A ) and GALS (globally asynchronous locally synchronous systems) (V = V S ) as special cases.
It turns out that one can analyze the behavior of this subclass in much more detail. In particular, we are interested in the determination of a maximal{rate periodic schedule in this case. We already know that the corresponding period is P cm (Ĝ) whereĜ = (V; A; d; dh e). As Proof: The proof is based on the fact that according to the case v i ; v j 2 V S and v j 2 V A , the inequalities which determine the ring times in G andĜ are identical, i.e., = = 0. Therefore, we have i (k) = i (k) ? p i =^ i (k).
Finally, the following theorem and the corresponding constructive proof lead to periodic maximal{rate admissible schedules for a subclass of mixed asynchronous{synchronous systems. Thus far, we know from Theorem 3 that the average period P min of a MASS G can be bounded by P cm (Ĝ) P min P cm (G). The bounds on the minimal average period strongly depend on the valuesĥ andh ofĜ andG, respectively. By de nition, h ij = h ij + p i ? p j . Hence, the minimal average period depends on the given node phase assignment.
Let us suppose we have the freedom to adjust the clock phases of synchronous nodes of a given MASS. Then we might be able to nd a combination of phases leading to another MASS having a smaller average period. Hence, the objective of this chapter is to investigate the e ect of the phases of synchronous nodes on the minimal average period of topologically equivalent MASS graphs.
Example 9 Given the MASS of Fig. 7 with two synchronous nodes v 1 and v 2 , and holding times h 12 = 1:9, and h 21 = 1:7, respectively. The average periods for di erent combinations of clock phases can be computed as follows:
1. p 1 = 0:5, p 2 = 0:5 ) h 12 = 1:9,ĥ 12 = d1:9e = 2, h 21 = 1:7,ĥ 21 = d1:7e = 2 ) P cm (Ĝ) = P min = 2. 2. p 1 = 0:1; p 2 = 0:6 ) P cm (Ĝ) = P min = 2:5. 3 . p 1 = 0:2; p 2 = 0:0 ) P cm (Ĝ) = P min = 2:5. Fig. 8 is a density plot of the period P min in dependence of p 1 (x-axis) and p 2 (y-axis). For all combinations of clock phases p 1 ; p 2 2 0; 1), it turns out that only two di erent periods exist.
This simple example shows that the average period may vary due to di erent assigned clock phases. Here, we therefore would like to consider the optimal phase assignment problem. Before we introduce the optimization problem, we de ne when a MASS G is called phase optimal.
De nition 8 Given the set of MASS M = fG = (V; A; d; h; p) : 0 p i < 1g.
We de ne P opt := min G2M P min (G) and call P opt the optimal average period of the set of MASS M. A MASS G 2 M is said to be phase optimal, if P min (G) = P opt .
In other words, a MASS G is called phase optimal if there exists no MASS G 0 with same topology and same weight and distance functions, but di erent phase assignment p and smaller average period. In this section, we are going to address the following questions:
How can we nd the minimal period P opt in case the clock phases of all synchronous nodes are freely adjustable? Sometimes, some clock phases are xed, others are not. What is the best adjustment of the remaining clock phases then? We will answer these questions by proposing a simple optimization procedure. First, we introduce an exact procedure for nding an optimal phase if the MASS contains no asynchronous nodes. In the subsequent section, we will consider the general case. We will introduce a polynomial time procedure for nding a MASS G 2 M such that P min (G) satis es P min (G) P opt + 1.
Exact phase optimization
From Theorem 1, we know that when given a MASS containing no arcs from asynchronous to synchronous nodes, then the free schedule is identical to that of the corresponding marked graph given in Theorem 3 Let P = H L , H; L 2 Z and H; L coprime, be an optimal solution of (7) and p an optimal phase. Then the minimal average period of the corresponding MASS is P min = P = P opt . Furthermore, the schedule i (k + L) = i (k) + H is an L-periodic maximal-rate admissible schedule for G where
Proof: Let the phases of a given MASS be xed. Then we are able to determine P min by determining the minimal average period of the marked graphĜ by solving the linear program (see (5) Example 10 Given the MASS G in Example 9 and Fig. 7 . Solving the corresponding MILP in (7) provides the solution P = 2. From (7), we obtain three di erent solution polytopes P 1 , P 2 and P 3 corresponding to the solutions h 12 = 3;h 21 = 1 for P 1 ,h 12 = 2;h 21 = 2 for P 2 , andh 12 = 1;h 21 = 3 for P 3 .
The polytopes are given as follows: 
For example, for P 2 , the periodic admissible schedule^ ofĜ satis es:
for some parameter 2 R. Fixing this arbitrary time shift to = 2, we obtain admissible schedules for G corresponding to P 2 as:
1 (k) = d2ke + p 1 = 2k + p 1 2 (k) = d2ke + p 2 = 2k + p 2 with ?0:1 p 2 ? p 1 0:3, 0 p 1 ; p 2 1 and period P = 2.
Note that the MILP in Theorem 5 can be changed in a simple manner to account for other constraints concerning p i , e.g., by simply adding the corresponding inequalities to (7).
A polynomial time procedure for nearly optimal phase assignment
In this section we give an e cient procedure to nd a phase assignment p for a set of MASS M such that the corresponding MASS G 2 M has a period P min (G) at most one unit greater than P opt .
De nition 9 Given the MASS G = (V; A; d; h; p), de ne G A = (V; A; d; h) to be a marked graph in which all nodes are assumed to be asynchronous, i.e., F j (a) = a for all v i 2 V in De nition 7. Since the nodes in G A may start computation at arbitrary time instances, P cm (G A ) P opt . In other words, a lower bound on P opt is P cm (G A ), the maximal cycle mean of G A . Theorem 6 The phase assignment p as described in Lemma 1 leads to a MASS G whose minimal average period satis es P min (G) P opt + 1.
Lemma 1 Let
Proof: From Lemma 1 we know, that the phase assignment p leads to a MASS G with P min (G) dP cm (G A )e. Moreover, we have P cm (G A ) P opt which leads to P cm (G A ) ? P opt 0 and dP cm (G A )e ? P opt 1 . As a result, we have P min (G) ? P opt 1. The entire procedure to determine p takes O(jV jjAj) time according to (4) and (5). The whole procedure is explained in the following example.
Example 11 Consider the MASS shown in Fig. 7 again. We obtain P cm (G A ) = 1:8 and dP cm (G A )e = 2. Thus, we can use Lemma 1 to determine a phase assignment p 1 , p 2 such that the resulting MASS G satis es P min (G) 2. The following steps lead to a phase assignment according to Lemma 1 and Theorem 6:
We obtain the following 1-periodic schedule for G A : 
Modeling Issues of Implementation
Finally, we would like to show how the MASS model may be used for representing relevant aspects of the design of mixed asynchronous{synchronous systems.
Modeling of nite queue sizes
It is possible to model a data-driven communication involving a queue of limited size. In this case, the node set V represents the set of computation modules, and the arc set A represents unidirectional communication channels with FIFO-like bu ering capabilities. On the circuit level, the selftimed communication may be realized using a handshake protocol (e.g. 
Computing holding times and clock periods
In this section we show how to derive holding times h ij and clock periods r j from a typical speci cation of a mixed asynchronous-synchronous system. In a MASS, a synchronous node v j represents a synchronous subsystem, e.g., a synchronous nite state machine or a pipelined digital lter. On the system-level, the characteristics of such a synchronous module may be given by its latency (time between arrival of inputs until the corresponding outputs are available in number of clock cycles) of L j clock cycles and its absolute clock period T j in time units, e.g., nanoseconds (ns).
Example 13 Consider again Example 1 and the corresponding Fig. 1 . Let us suppose that the microcontroller (MC) is connected to a host as depicted in Fig. 10a . Consequently, we may model the host and the microcontroller by two synchronous nodes v 1 and v 2 , see Fig. 10b . Let the clock periods of these modules be T 1 = 100ns, T 2 = 50ns, the computation time of module v 1 be L 1 = 2 clock cycles, and let L 2 = 3 denote the number of clock cycles needed for module v 2 to carry out a computation. Hence, the computation time for module v 1 is 2 100ns = 200ns, the computation time of module v 2 is 3 50ns = 150ns. The modules are connected together and communicate in a data-driven manner as shown in the MASS graph in Fig. 10b The resulting MASS graph is shown in Fig. 10b . Details on the communication between the microcontroller and the servo are not given.
This computation shows us how to include the performance gures of synchronous subsystems in our system level MASS graph. Hence, we can use tools like retiming and pipelining 14] to change the numbers L j and T j , respectively, and study their in uence on the system level by simulating the corresponding MASS graphs. Therefore, our model allows synchronous subsystems to be optimized and developed separately and the e ects to be evaluated e ciently.
Simulation of synchronous systems
The MASS model can be used for modeling synchronous systems. Starting point is a given synchronous signal ow graph consisting of combinational modules and their interconnections which may contain synchronous registers. For the sake of simplicity, we assume only one clock phase to clock all of the registers (i.e., single clock, monophase, edge-triggered). There are many ways of modeling such a system in the MASS model. For example, if only the sequence of operations is of concern, one may simply replace the registers by initial tokens and the combinational modules by asynchronous nodes of a marked graph.
Another possibility is the modeling shown in Figs. 9b ,c : The synchronous registers are replaced by synchronous nodes as shown in Fig. 9b . The combinational modules are modeled using asynchronous nodes whose holding times are the delays of the corresponding combinational modules, see Fig. 9c . The timing corresponds to that of the synchronous system if input tokens are continuously available and if all accumulated delay times between registers are smaller than the clock period.
The next possibility takes into account the global clock generation of a synchronous system. An extra clock node is responsible for the simultaneous transfer of tokens to all synchronous nodes each of which models one register. The connection between the clock node and the synchronous nodes is bidirectional and corresponds to an interconnection with a queue length of one. This prevents the accumulation of clock tokens and guarantees that if the longest delay path between two registers (synchronous nodes) is larger than the clock period, the period of the whole system is slowed down correspondingly. The modeling of the registers is shown in Fig. 9d where combinational modules are modeled as in Fig. 9c. 
Interfacing
Finally, some remarks concerning interfacing asynchronous and/or synchronous subsystems are presented. The MASS shown in Fig. 9e models the situation where node v 1 belongs to an asynchronous (or synchronous) subsystem whereas node v 2 models the ' rst`register of a synchronous subsystem. This input register accepts tokens only at integral time instances. The holding time h represents the sum of the communication time and the processing time of the synchronous node v 2 , e.g., for executing the communication protocol. If the asynchronous part is faster than the synchronous one, tokens will accumulate in the queue (v 1 ; v 2 ). This can be avoided by using nite length queues, as in Fig.  9a . If the synchronous part is slower, the synchronous node v 2 does not process a token every clock tick, but all other nodes in the synchronous subsystem do. This behavior can be implemented in hardware by sending a token{bit in addition to the data which signals that a register contains valid data. The clock operates continuously. In essence, the operation of an asynchronous system is simulated by the synchronous one.
Another possibility is shown in Fig. 9f . Here, the concept of a global clock node is used, which is bidirectionally connected to all synchronous nodes of the synchronous subsystem. The incoming token is split into one that represents the data and another one that serves to enable the clock signal. In a circuit implementation, this corresponds to stopping the clock or gating the clock of the synchronous subsystem if no input data is present.
If the output of the synchronous subsystem is connected to an asynchronous subsystem, similar situations occur.
Conclusions and Future Research
Of further interest may be the concatenation of high-level synthesis tools for synchronous circuits (see 28] for an overview), and asynchronous circuits (see papers edited in 6, 7] ), and investigate the in uence of local node optimizations on the system-level performance, given e.g., by a MASS graph. The MASS model may be useful also in the domain of interface synthesis, which synergistically combines asynchronous and synchronous design styles. Due to the proven e ciency of the presented analysis algorithms, we believe that the MASS model is an interesting model for system-level CAD tools.
The rst author would like to thank the DFG for the grant (TE163/4-1) that supported his work at UC Berkeley. S. Sriram was supported by SRC under grant 94-DC-008. Thanks also to Edward Lee for many helpful comments on the subject of this paper, and to the Ptolemy project for supplemental support. 
