Abstract
Introduction
Asynchronous circuits present properties that meet the requirements for large, complex systems [l, 21: no clock skew, modular interconnectivity, low peak currents, and performance determined by average processing speeds. The design of asynchronous systems based on self-timed circuits [l] has been more broadly accepted in the last years. Most work on design autornation for asynchronous systems has been focused to logic synthesis of sequential machines [3]. Cur- rently, significant effort is being invested in the synthesis of hazard-free circuits from Signal Traiisition Other approaches synthesize asynchronous circuits from high-level specifications by syntax-directed translation, according to production rule sets. In 51 and [6] a similar strategy is used to translate ( 1: ! A 'P into delay-insensitive circuits. We cannot consider, however, these approaches within the category of highlevel synthesis, since no attempt is done to improve the quality of the circuit, (size and performance) by using optimization techniques like operation scheduling and hardware allocation [7] . Syntax-directed translation generates circuits whose size depends linearly on the size of the input desc,ription [5] . The efforts on high-level synthesis [7] have been mainly focused to synchronous designs. A clear evidence of this tendency is that the proposed schedul- [lo] consider the possibility of having synchronous operations with unbounded delays.
From the point of view of the timing model used for operation scheduling and control synthesis, asynchronous systems present two significant differences:
Time is considered as a continuous variable and initiation and completion of operations are events that can occur a t any instant. Operations have variable, data-dependent delays.
Therefore, different scheduling strategies must be conceived if an asynchronous timing model is considered. This paper aims a t the definition of basic concepts, data structures, and primitive functions for asynchronous scheduling algorithms and control synthesis.
The paper is organized as follows. Section 2 describes the architecture model considered for the asynchronous execution of operations. Section 3 presents an overview of a high-level synthesis system. Section 4 defines the basic data structures and functions proposed for scheduling algorithms and presents two algorithms for operation scheduling. Section 5 describes the connection between module binding and process synchronization and how control can be synthesized. Conclusions and future work are presented in section 6.
Asynchronous Architecture Model
A high-level synthesis system requires a target architecture model for the mapping of high-level objects (operations, variables, data transfers) into hardware modules (ALUs, registers, multiplexors). This section presents a short summary of the architecture model proposed in [Ill (only details referring to operation scheduling and control synthesis will be described).
Data-Path
The data-path is composed of self-timed blocks (ALUs, registers, multiplexors, etc) synchronized by means of a handshaking protocol implemented with two signals: request and completzon [l] . Registers are implemented as latches (the request signal indicates when latching must be initiated). Each hardware module is considered a process which executes operations and synchronizes with other processes when data transfers are required. The granularity of the control distribution may vary for the sake of the circuit performance. Hereafter and without loss of generality, we will consider that an LC exists for each cornputation block (or register) and its input multiplexors. The execution of an operation has the following steps:
Distributed
1 Input data are read from registers. 2 Multiplexors transfer input data to their corresponding functional unit input.
3 Operation is executed in a functional unit.
4 Multiplexors transfer output data to a register 5 Output data is latched into a register.
High-Level Synthesis: an Overview
The system input is a behavior description of the circuit described by a Control Data Flow Graph (CDFG). The first step performed is Scheduling and Allocation (figure 2). Allocation selects the number and type of hardware modules that will compose t>he data-path. In operation scheduling, operations are distributed through the time space arid are assigned to a type of functional unit. The output is a Scheduled Data Flow Graph (SDFG), which is a modification of the previous CDFG containing information related to scheduling and allocation. In Resource Banding operations are bound to hardware modules. The output is a Bound Data Flow Graph (BDFG) where each operation has been bound to an FU instance and each variable to a register. After binding the set of operations that will be executed in each process is totally defined and the behavior of the local controllers can be derived as it is explained in section 5.
Scheduling
We will represent the scheduling problem with a data flow graph (DFG) 
For now on, the following assumptions will be considered (which are close to reality):
0 Since control is evenly distributed all over the circuit, it is assumed that delays introduced by the L ( 3 are constant. Finding an asynchronous schedule means defining a partial ordering of the vertices and allocating each operation to a type of FU so that the total estimated processing delay is minimized. Two functions have been defined for rnanaging the Event List [12] : 
Data Structures
f E FU(v0).e
Event-List-Based Scheduling
Two algorithms for operation scheduling in asynchronous systems are presented in this sec.tion:
ELS (Event-List Scheduling) and ELLAS (Event-List
Look-Ahead Scheduling). Both algorithms select vertices to be scheduled acxording to a priority function.
They differ in the calculation of the priority function: in the former the priority of each vertex is calculated at the beginning of the algorithm, while in the latter it is dynamically evaluated as a result of a look-ahead scheduling function.
During the execution of a scheduling algorithm the set of nodes of the DFG can be partitioned into three Information about utilization of resources is kept in the event lists, and managed by functions f i n d f r e e i n t e r v a l and r e s e r v e i n t e r v a l . When an operation can be executed by more than one type of FU, ELS tends to bind vertices with more priority to faster FUs. The time complexity of ELS is O ( n log n ) .
As mentioned before, ELLAS dynamically calculates each vertex's priority by using a look-ahead scheduling function. The time complexity of ELLAS is O(n310gn) [12] .
Results
Figure 4: Delay Matrix used for the benchmarks.
In this section, the scheduling algorithms previously presented are evaluated. Two benchmarks have been chosen to present the results of the experirnents: the Differential Equation Solver [8] and the Fifth-order Wave Digital Filter [13] . The library used for both benchmarks is represented by the Delay Matrix depicted in figure 4. small size of the problem. Figure 5 depicts the resulting schedule for one of the experiments. For the Elliptic Filter ELLAS is superior to ELS in most cases, at the cost of higher CPU times (but still moderate). Figure 6 shows the schedule obtained with ELLASfor one of the experiments. It is worth to emphasize the skill for binding critical operations to fast FUs and non-critical operations to slow FUs when they can be concurrently executed.
Binding and Process Synchronization
Module binding is performed after scheduling and allocation in order to bind operations to hardware module instances. The criteria used for binding aims at the reduction of the connectivity of the circuit so that routing area and communication delays are minimized In that respect, binding algorithms already proposed for synchronous circuits can also be used for asynchronous circuits (if they do not use control-step-
Once binding is performed, the sequence o f operations to be executed in each process is completely defined. Each data transfer between two operations corresponds to a synchronization between processes when the operations are bound to different hardware modules. In figure 7 each arrow corresponds to a synchronization between processes for the scheduling example of figure 5 a.
Each data transfer between a computational block and a register requires an explicit synchronization between the processes corresponding to each of the in- 
Conclusions and future work
As the design of asynchronous circuits is gaining acceptance, tools for high-level synthesis are more riecessary. This paper has presented the first, approach, to the knowledge of the authors, to scheduling for the high-level synthesis of asynchronous circuits. For an asynchronous timing model, in which no control steps exist, scheduling means defining a partial ordering of the execution of the operations. Basic data structures and primitive functions for the mariagernent of initiation and completion operation events have been defined. Two algorithms, ELS (O(n log n ) -t i m e ) and ELLAS O(n3 log n ) -t i m e ) have been proposed and Further research is required in this emerging area. Among the issues not considered in this paper, we mention some of the most significant: scheduling across basic. blocks, pipelined functional units, and evaluate 6 . scheduling under timing constraints. On the other hand, further research is also required in the area of distributed control synthesis for asynchronous systems.
