THE RAFAEL MULTI-TARGET HETEROGENEOUS SIGNAL-FLOW GRAPH COMPILER by Paller , Gábor & Cséfalvay, Klára
PERIODICA POLYTECHSICA 5ER. ELECTR. E]v"G. \/OL. .. p, 1'.i0. 3, PP. 201-2:]0 (1997) 
THE RAFAEL MULTI-TARGET HETEROGENEOUS 
SIGNAL-FLOW GRAPH COMPILER 
Gabor PALLER and Klara CSEFALVA Y 
Department of Electromagnetic Theory 
Technical "C niversity of Budapest 
H-1.521 Budapest, Hungary, 
e-mail: palled~fe\·t.bme.hu 
csefal vay.Q;evt. bme.hu 
Recei\·ed: June 23,199.5 
Abstract 
This paper describes a signal-flow graph compiler which produces distributed code for 
heterogeneous target systems. The compiler is devoted for mainly Digital Signal Process-
ing problems. The code generator features reprogrammable operation library, the static 
scheduler supports fully heterogeneous systems and the input graph may contain run-time 
decisions in a limited way. The system has been implemented on IB.\I PC compatibles 
under MS-VVindows so it does not require expansive host computer. 
Keywords: compile-time scheduling. parallel processing, heterogeneous architectures. 
1. Introduction 
Writing programs for the modern Digital Signal Processors (DSPs) intro-
duce difficult tasks for the software engineers because a painful trade-off 
exists between the computing power and the productivity/task complexity. 
Unfortunately the existing and well-known higher level programming envi-
ronments (for example the 'C' language) perform very poorly on the DSP 
platforms because being general languages they cannot exploit the special 
capabilities of the DSPs (circular buffers, parallel instructions and so on) or 
avoiding pipeline effects. This can cause extremely high performance loss 
(can be as much as 1000% compared to the assembly realization). Several 
developments were made to improve C compilers on DSP platforms (LEARY 
and \YADDI?\GTO\", 1990) but generally they use system or DSP dependent 
language extensions and their performance is still not really convincing. 
So the d~yelopers have to choose - writing the DSP code in assembly for 
achieving higher performance thus lower hardware cost or using a high-
level environment which will speed up the development but decrease the 
efficiency of the DSP so that more expensive DSPs must be chosen. It can 
even happen that the problem cannot be solved on high level. 
The other problem is the embarrassing abundance of DSP architec-
tures and languages. One often faces the problem of porting existing results 
202 
onto other DSP platforms. If the code is written in assembly, this will be a 
long and tiresome process. Some 'common languages' are needed but not 
having efficiently realizable high level platform this solution does not seem 
to be promising. Nowadays the solution is sought toward optimized soft-
ware libraries (like the SPOX) which try to combine the power of assembly 
routines with the efficiency of C. The SPOX does accelerate the developing 
process but it is a fixed set of routines and if we extend it (for example ,ve 
need an arithmetic routine or new algorithm that the SPOX cannot offer) 
we still have to write it in assembly losing the portability. 
Nowadays the parallel DSP is in the focus of attention, first of all be-
cause real-world DSP problems often require immense computing power. 
A number of existing DSPs can be used for parallel realizations, some of 
them has been designed especially for parallel computing for example Texas 
Instrument's TMS320C40, TMS320C80 and Analog Devices ADSP21060. 
The task scheduling is an important part of the multiprocessor implementa-
tion of DSP algorithms. This equally means partitioning the tasks among 
multiple DSPs and scheduling the tasks on each DSP. Generally paral-
lel programs are scheduled 'by hand' in the existing parallel development 
systems which is a difficult task and in the case of more complex tasks 
it cannot be done effectively. The other approach used frequently in the 
existing DSP operating systems uses the well proven real-time operating 
systems scheme (sometimes time-sliced scheduling is added). This scheme 
is based on separate tasks and a task scheduler program which changes the 
tasks ""hen it is necessary. This task scheduler requires processing time. 
Speciality of the DSP algorithm is that it does not require much run-
time decisions. Very handy description form of these algorithms is the 
signal-_Row graph (SFG). Signal-flow' graph is a graphical description of 
an algorithm in which computations are represented by graph nodes and 
dependencies arr.(Lg the computations by graph branches. If we can cluster 
enough nodes together that their dependency graph and execution time do 
not depend on the input values, we can schedule in compile time thus 
eliminating the processor load of the dynamic scheduler. 
Thus the DSP code generation problem is the following: we need 
a system which is flexible enough to be adapted to several existing DSP 
platforms, avoids the po\ver loss of the high-level languages, solves the par-
titioning and scheduling problems and in addition it is easy-to-use for the 
DSP algorithm developer who is generally not a programmer. A proposi-
tion for this problem will be presented in this document describing Rafael, 
an intelligent code generator based on signal-flow graphs. 
Rafael was designed as a small, flexible system which can run even on 
very small computers (it is implemented under Microsoft Windows on IBM 
PC compatible computers). It is a SFG compiler integrated into a simple 
THE RAFAEL .l1ULTI-TARGET 203 
framework which allows DSP algorithms to be described in SFG form and 
the compiler translates this description into program for a heterogeneous 
multiprocessor hardware. The compiler distributes the SFG on the multi-
processor system, schedules the operations on each processor, creates the 
communication scheme among the processors and generates executable as-
sembly source program for each processor. Rafael features a programmable 
DSP database and code generator library so it can be adapted easily to any 
processor. Small resources of the host computer do not allow us to com-
pete with the comprehensive features of existing SFG compilers hosted on 
workstations but we hope to prove that Rafael can compete successfully on 
several domains with those systems. 
2. Existing Data-Flow Compilers 
A number of block-diagram based design systems have been introduced in 
the literature. We mention here the commercially available DSPlay (Burr-
Brown) and SPW (Signal Processing Workstation) (Comdisco) systems. 
DSPlay is PC-based, it can simulate the input block-diagram and can gen-
erate code for AT&T DSP32. The Comdisco system started as a simple 
simulator but actually it is able to produce highly optimized code for almost 
all the DSP types and can even generate circuit description. Since June 
1994 the partitioning on multiprocessor DSP system must have been done 
by hand. The Cathedral system (DE \IA:\" et al., 1986; LA:\"::\EER, 1993) 
devoted to circuit synthesis features SFG partitioning-scheduling but it 
uses the Silage functional language (GE:\r:\ et al., 1990) as its input. The 
Ptolemy system (Bl:o: et al., 1991; Bl:CK, 1993; Bt.·Cl( et al., 1994) is the 
most comprehensive existing simulation/code generation system. Ptolemy 
supports the coexistence of different computation models (called domains 
by their terminology) and offers clearly defined object-oriented interface 
for defining a new domain. Existing domains include static datafiow (LEE 
- MESSERSCH~IITT, 1987), dynamic datafimv (Bl:CK, 1993), discrete event, 
message queue and communicating process (BUCK et al., 1994) models. 
Ptolemy makes almost no assumption about the internal structure of the 
computation models it supports, it is the biggest strongness and weakness 
of this system. It is a strongness as it allows modelling the whole system in-
cluding its software, hardware and communication parts in one framework. 
It is weakness as Ptolemy allows mixing computation models that do not 
coexist well, it does not force a good design style. Nevertheless, Ptolemy 
has huge impact on the field and its importance grows continuously as 
existing computation models and tools are integrated with it. 
204 G. PALLER ana j.;, CSEFAL\AY 
Many ideas of the structure of Rafael were borrowed from the now 
historical Gabriel system. Gabriel was phased out in favor of the much 
bigger Ptolemy system but we found that some solutions introduced in 
Gabriel fit well to our much less powerful target platform. Gabriel (LEE 
et al,. 1989) was the first system capable of generating executable code at 
Berkeley in which the synchronous dataflmv paradigm was implemented. 
Its predecessor, BLOSIM (:\IESSERSCH:-IITT, 1984) was only a simulator. 
The operations (or actors by the terminology of the Berkeley team) 
are called stars. A cluster of stars forming an interconnected SFG is called 
galaxy. The final SFG can be hierarchical composed of a number of galax-
ies, a set of interconnected galaxies is called universe. Gabriel has t\yO 
levels of user interface. The graphical dataflow organization is used where 
appropriate: when describing the algorithm in dataflow format. The stars 
have textual definition. This mixed description form helps to avoid the 
common problem of the graphical description systems which use graphical 
terms where they are not handy. 
One of the most striking features of Gabriel is its programmable star 
library which influenced a lot the database of our Rafael system. A Gabriel 
star is described by a Lisp structure. The star library entry has a header 
and a function body. The header structure stores information about the 
inputs and outputs of the operation, a short textual description for hu-
man readers and the parameters and their default values. An entry in 
the header points to the star function which gets executed whenever the 
star is invoked. This star function can actually execute the operation as-
signed with the star in simulation mode or can generate a code for the 
actual target processor in code generation mode. It is important to note 
that the code generator star library is written in Lisp so a code generator 
function can be 'l'1ite intelligent when it decides on the text to be gener-
ated depending on the parameters. size of the inputs, etc. Beside the star 
function, a Gabriel star can have initialization/termination functions that 
are called once before the first invocation and after the last invocation of 
a star. Processors are described in a similar way creating Lisp lists that 
contain the target system characteristics: number of processors, processor 
memory, special hardware units connected to processors, communication 
channel characteristics between the processors and communication code 
generator routines. The Gabriel system is strictly homogeneous: there can 
be only one star library in the memory. 
The Gabriel system has the following interesting features: 
It handles multiple sample rates which result naturally from its input 
format, the synchronous dataflow graph. 
205 
It has a second user level, the star library programming level in Lisp 
yvhich allows the user to create new stars easily and to add intelligent 
optimization/ code generation features to the existing star library. 
The main weaknesses: 
- It does not address the question of data dependent constructs, if-then-
else, case, etc. 
It does not support heterogeneous systems. 
- Its scheduler cannot be considered efficient. 
Another system that influenced greatly our work is SynDEx (SOREL, 
1994). SynDEx is a code generator environment designed, to be interfaced 
with the synchronous language compilers, SIGNAL (LE GLTR:\IC et al., 
1991), LUSTRE (H.-\LB\\"'-\CHS et al., 1991), ESTEREL (BOL"SSI:\OT SI-
\1O:\E, 1991). It has a graphical and textual user interface that allows 
users to construct the algorithm block diagram entirely in SynDEx. It 
is designed, ho'sever, rather to receive the algorithm graph from a syn-
chronous language compiler. Actually SynDEx is interfaced in such a way 
with SIGNAL (BOL"H:\A.L 1994) and work is under way to create a common 
format for the SIGNAL, LUSTRE, ESTEREL languages so that they can 
send the result of compilation to SynDEx or other code generators. The 
algorithm model of SynDEx is the conditioned signal-flow graph. It means 
that each node has a clock it is associated to which results in a condition 
input for each node (Fig. 1). 
I Clock inpl It-----;;..""I Boolean I Operators I Clock inp2 It-----;;..l>'l 
'------' 
Condition input 
Data operation 
Data output 
Fig. 1. Conditioned signal-flow graph 
A node is fired if all its input variables (including the control variable) have 
been produced by predecessor nodes and its control variable is true. The 
scheduler considers the condition input dependency as any other depen-
dency: it is equivalent with supposing that each condition is true and each 
node can be executed. This way the original conditioned signal-flow graph 
is transformed to a synchronous signal-flow graph and static scheduling can 
20G 
be used. The original conditioned signal-flow graph is thus partitioned into 
a condition calculating part (which is unconditioned) and a data processing 
part (which can be conditioned). Is is the responsibility of the SIGI\AL 
compiler (or the input graph designer) that a proper condition signal be 
assigned to each node. 
The biggest problem about the SynDEx system is caused by the way 
it handles the conditions. The actual implementation does not use the 
condition tree (A:'I1.\(; BEG:\O:\ et aL 1994), constructed laboriously by the 
SIGKAL compiler, the hierarchy of clocks disappears, all the clocks become 
'level l' clocks (inserted just under the root clock). The code generator 
does not group operations scheduled one after the other \\-ith the same 
conditions into one if ... endif. Other drawbacks are that SynDEx does 
not support heterogeneous architectures and it can generate only C code. 
3. Major Design Considerations of the Rafael System 
The Rafael structure was designed according to the four main goals intro-
duced at the beginning of this chapter. The support of heterogeneous sys-
tems needed a flexible operation library or even better programmable 
code generator module. Considering the code generator programmer's con-
venience, compiled languages can be quickly eliminated because it would 
need the recompiling and relinking of the code generator modules each time 
the database is modified. A system constructed in this way ,,;ould be much 
more prone to system crashes as compiled languages all 0\\- great liberty in 
manipulating the system resources. \Ye decided that reprogrammable parts 
of the code generator be implemented in an interactive, interpreted lan-
guage. As we intended to provide the possibility of important intelligence 
in these modules (as they determine the quality of the code generated) 
we ·wanted to choose a more powerful language. Considering the possible 
candidates we chose Lisp because of the following advantages: 
- It is a very powerful language that allows run-time program creation 
and it is equipped with efficient database handling capabilities. 
- Lisp interpreters are available in relatively smail memory requirement 
versions which fit well to the small computer (PC) we planned the 
system to run on. 
Excellent quality public domain versions have been written and dis-
tributed for several platforms in source code. 
- It is a common language in CAD systems. 
'vVe must consider, however, the slow execution speed of Lisp whichis 
an even more serious obstacle on a small PC system. , ..... lthough in the sense 
::'.iF.:'.: :20, 
of ease of programming it would have been more advantageous to realize the 
system entirely in Lisp, this solution would have resulted in unacceptable 
run time on the target system. 
4. The Structure of the Rafael System 
For the reasons mentioned in the previous section reason we choose a hybrid 
structure depicted in Fig. 2. 
Output 
code 
Each part of the software v:here user modifications are not supposed was 
implemented in This gives us a relatively powerful language with 
acceptable execution speed. Programmability is provided at Lisp level 
where an interface has been defined for the d;'Ltabase and code generator 
programmer. By means of this interface the user can extend the database 
and the code generator library. The compiler core calls these routines from 
C---,- level and uses their return value appropriately. 
This solution needed separate tasks and interprocess communication 
bet\\'een the tasks. The minimal 'operating systell1' that is sufficiently 
popular and needs small resources was the :VIicrosoft \Yindows. .-\1 that 
time Linux (a small r nix ,'ersion for pes) -\vas not in the state that ,ye could 
have considered it as an alternative against \Yindows. By my personal 
opinion vYindO\\-s is a poorly designed, inefficient 'operating system'. today 
we would choose some other platform. 
Thus, Rafael \\-as implemented under ;\IS- \Vindows, parts of this 
softvvare (Fig. 2) run as separate vVindows tasks and they are connected 
through the interprocess communication channels of vVindows. The pop-
ular Xlisp was chosen as Lisp interpreter for Rafael because it is close to 
Common Lisp and it is available in C source. Xlisp \\-as ported to vYindO\\-s 
platform and the necessary interprocess routines \vere inserted that allows 
this Lisp interpreter to run as a server task. 
208 
The three Rafael software components have the following tasks. 
Graph editor The name is a bit exaggerating as the Rafael frame-
work is far from a comfortable working environment. It features a multi-
screen text editor for creating/modifying graphs in textual format, initial-
izes the Xlisp server and launches the Rafael compiler on the actually edited 
graph. 
Graph compiler It is the SFG compiler. The program analyses 
graph description, makes the scheduling and generates the output text. It 
can run standalone as weiL not only from the framework. 
Lisp interpreter The operation database and its associated code 
generator routines are realized in Lisp. The client programs launch the 
server and send requests to it through interprocess links. Requests are 
actually Lisp commands which are executed by the server and the result 
of the Lisp command evaluation is returned to the caller C++ program. 
As we can see the Rafael software architecture is very similar to that of 
Gabriel hence the similarity of the names. Rafael is different from Gabriel 
at the following points: 
Rafael's whole structure is adapted to the small host systems it runs 
on. Not the whole compiler was implemented in Lisp, only a part of 
it. 
As \\le will see, Rafael's \vhole design including the database, the 
scheduler it uses is adapted to heterogeneous systems. Gabriel was 
multi-target as it supported multiple start libraries. Rafael is truly 
heterogeneous as muitiple target processors can coexist in the same 
operation librar~·. 
- Rafael supports a limited form of run-time decisions as its importance 
has been underlined many times both in the literature and in the 
practical engineering work. It will be detailed in section 6. 
- Rafael features more advanced and efficient scheduler algorithms. 
5. Rafael Nodes and Connections 
The Rafael software model defines nodes that represent certain operations 
and connections between them. Nodes can be of the following types. 
Operations Operations cover functions attached to a certain node. 
An operation is a parametrizable function. The number of inputs, outputs, 
the execution time and the operation of the function itself can depend on 
constant parameters. 
Probes Probes cover functions whose task is to acquire input data 
from the environment of the datafimv system and send output data to 
the environment of the datafiow system. Probes are treated as simple 
THE FiAFAEL .'.feLT!· i-'"~3GET 209 
operations (with non-zero execution time, if necessary), the only difference 
is that they are explicitly forced to certain processors by the user. It 
derives from the fact that in a given hardware system the input and output 
hardware are assigned to prescribed processors. 
Delays Delays are special operators in the sense that they consist 
of two parts: a delay input (where new data is put into the delay) and 
delay output (where new data is retrieved from the delay). Rafael always 
treats delay parts as two distinct operations. It is guaranteed, however, 
that output of a delay be scheduled always before the input of the same 
delay. 
Each node input/output can have a type. Type is a character string 
which is checked for matching when node inputs/outputs are connected. 
Rafael allows dynamic type names resolved in compile-time that match to 
every static type name and solves the type name ambiguities. In Rafael 
dynamic type names start with the 'TYPE' string, for example 'TYPE23' 
is a dynamic type string. An adder that can add any type of data can 
have 'TYPE23' type of each input/output node. When any of the in-
puts/outputs is connected to an output/input with static type, the dy-
namic type is replaced by the static type by the checker. For example if 
the output of the hypothetical adder above is connected to an input node 
with 'TIME' type, 'TYPE23' is replaced by 'TIME' for all the adder in-
puts/uotputs and type checking continues on the inputs. Fig. 1 illustrates 
the process. 
l------;Z3I""'G 
TIME 
f-------'~G 
TIME TIME 
TIME 
FRE~ 
Type error I 
Fig . . :7. Propagating type names in Rafael 
Depending on the operation library, 'tokens' can have arbitrary size. The 
actual Rafael operation library supports one-dimensional vector tokens. 
2]0 
6. Rafael Software Model 
Rafael accepts a restricted version of synchronous dataflow graphs (LEE 
- ~IESSERSCH:'lITT, 1987) for scheduling. This restriction means that if a 
node output prod uces or input consumes more than one token, it can be 
connected only to an input or output that consumes or produces one token. 
See Fig. 4 for example. This simplified scheme allows Rafael to support 
practically relevant upsampling/ downsampling operations without getting 
to a problematic loop scheduling problem (BIIXrT.\ClI.\HYY .. \ LEE, 1994). 
~ 
AllOWed connections in Ra f ae I 
~_.:J::\ ) I~_~~ 
Not allowed cQ[lllections ill Rafael 
Fi.<J. ·f· ftaf"a.,j·s rest riCled 
Rafael has two software models. The first one is a classical synchronous 
dataflow model which does not allow run-time decisions. This model has 
been proved to be too restrictive but this is the most effective one. It alio\',-s 
all kinds of supported operations in the dataftow graph but no conditional 
structures are p·ccmitted, we will call it static model in the future. The 
static scheduler will be invoked for this graph and a single-block schedule 
will be generated. This model is the restricted version of the second one 
that allows run-time decisions. 
Based on the conditioned dataftow model of synchronous languages a 
conditioned block dataflow model was implemented in RafaeL we will call it 
dynamic model. Inserting if ... endif constructs around each operation 
and considering all conditions true it is an evident but not too efficient 
solution for the run-time decision problem. Instead Rafael forces the SFG 
designer to group parts of the graph to a block. A block contains a graph 
portion for which the following holds true: 
1. Inside a block the graph portion is a synchronous dataflow graph 
without run-time decisions. 
2. All the operations in this block depend on the same condition. 
Ti1E [1.:'. F . ..!. EL .'.!r:LTJ-T.j.RGET 
Outside the blocks only probes and blocks are allowed. This is called 
Toot level. Operations are embedded into blocks, this is the block level. 
This simple scheduling scheme used in Rafael solves the scheduling 
problem in two passes. 
1. First it prepares static schedule for each block independently. Vari-
ables are propagated through the root level block connections and 
static scheduler is invoked for the block. 
2. Dynamic root-level scheduling. Blocks are considered as operations 
which run on all the processors at the same time. A list scheduler 
traverses the block connections and builds the order of the block con-
sidering only dependency relations. During the execution a block may 
or may not be executed depending on its condition input variable (if 
any). 
INP] 
INP2 
r-----"'> UUTJ 
1";.1. s. ExampiP "(,,tic mociei 
Fig. 7 demonstrates this method on the example dynamic model graph in 
Fig. 6. 
Advantages of the conditioned block schedule a[f~ the following: 
\Ve can provide conditional structures while preserving static schedul-
mg. 
The user of the system is forced to group nodes with the same condi-
tion together, the performance loss resulting from the repeated con-
ditional statements is thus avoided. 
- The static scheduling algorithm estimates the reality much better 
than in the SynDEx case. As a block contains only synchronous 
dataflow, the static scheduling is always exact, not only in the worst 
case as in SynDEx. 
SIG N AL compiler makes readily the operation grouping itself. 
VYe have to mention the following disadvantages: 
212 G. PALLER and j,:. C5EFAL \ ",4)-
r---------------------- I INPS--.",.; Block C ,..".: _______ -, 
I Cl 
IN"P4 : 
~n~: I 
----------------------~ 
_ _________ y_c:.0~dition ofBlockB 
:BlockB 
I 
1NP2--------------j_..;>+.»l 
I 
I 
I 
I 
I 
I 
I 
Condition of Block A 
-------------I 
, 
I 
1 ______ ---------- ______ 1 
oun 
INPI~ ~:r-------------..;~our-
I 
I I 1 ______________________ 1 
Fig. 6. Example of dynamic model graph 
PI PI 
P2 P2 
Schedule for Block C Schedule for Block A Schedule for Block B 
PI 
Block C Block A Block B 
P2 
Dynamic schedule (block executions are conditioned) 
Fig. 7. Example of dynamic model scheduling 
If the blocks contain insufficient operations, static schedules of blocks 
can be too sparse. In this case even true dynamic scheduling could 
provide a better solution. 
It is very easy to construct an incorrect graph. Consider the graph 
in Fig. 8. In this example Block B depends on Block A and in the 
root-level dynamic scheduling it is scheduled after Block A. It cannot 
be guaranteed, however, that Block A \vas really executed because it 
depends on a run-time decision. If the condition of Block A is not 
true, Block B will get its input from obsolete temporary variables 
producing a bad result. As Rafael makes no effort to check the cal-
THE RAFAEL .\fCLT!~ TARGET 213 
culation of condition variables, these situations cannot be signaled by 
the compiler. 
Other effect of the fact that Rafael does not analyse the condition 
calculation is that all the condition variables must be recalculated in 
each iteration. We can recall that SIGNAL compiler laboriously opti-
mizes the condition tree so that its output program can be the 'laziest' 
which means that if ... endif structures belonging to a clock expres-
sion on the lower level of the clock tree will be appropriately nested 
into if ... endifs of upper level clocks. The scheme presented above 
will flatten the clock tree putting all clock expressions to level l. 
In spite of the disadvantages we consider that the Rafael conditioned 
block model avoids successfully the dynamic scheduling and in the case 
of large static blocks and few decisions (which is often true at a DSP 
algorithm) it is sufficiently efficient. 
INP4 __ ~,-~'B~'iOCk:C'''''''''''''''''''''''1 
INPl--~;-;-...t 
L ................................ ; 
r'jil;;CkA .. 'l' .. ~~~~l:~~.~.~:.~.I.~~ A , ... iiiockii...... ~O.~~~~~.~~.~.l.~~~ ...... ~ 
NPZ---;..o-;;,.( Al ! ! 
~~ou-n 
INP3--~,-"' ..."\ .................................. J i .................................................. .! 
Fig. 8. Example of possibly erroneou;; graph 
'7. Rafael Hardware Model 
Rafael supposes an arbitrary number of interconnected, heterogeneous pro-
cessors as target system. The communication hardware connecting these 
processors can be heterogeneous as welL The static scheduling algorithm 
prescribes, however, that execution times of operations on all the processors 
of the target system and communication times on all the channels in the tar-
get system should be known in advance. These calculation/communication 
times can depend on certain parameters, in the case of calculations these 
21-1 
parameters are defined by the operation type, in the case of communication 
it depends on the amount of data units passed between the processors. 
Rafael uses a simplified communication model, critiques say it is over-
simplified. Rafael considers the communication structure totally intercon-
nected but allows different communication costs for both directions of each 
channel. The actual Rafael implementation does not have router algorithm 
so if the target architecture is not totally interconnected, virtual commu-
nication layer must be provided by operation library programmer. 
The basic Rafael communication notion is the channel. Channels are 
resources that are shared by processor pairs w-illing to communicate. A 
channel is assigned to each processor pair and that channel is occupied for 
the length of the communication between that processor pair. Other pro-
cessor pairs having the same channel number have to wait with their request 
until the channel is free. Channels represent hard'ware resources used for 
communication (bus, network, communication links, etc.). The processor 
pair-channel number assignment is fixed in the hardware database. 
Each communication activity can have three properties which are re-
turned by the hardware database functions to the compiler core. 
Activity time It is the time during which the communication activity 
occupies the processor it is scheduled on. If the communication hardware 
needs constant interaction with the processor (buffered serial line harciware, 
for example) the activity time is the same as the time required for the 
communication activity. In the case of DMA it is the Di:vIA initialization 
time. 
Survive time This is the time which is needed to finish the commu-
nication after the activity itself finishes. For example a DMA is initialized 
during the activity time then it accomplishes the task. During the survive 
time the variable which is sent cannot be reused and no new communica-
tion activities can be accomplished on that channel. On the receiving side 
all the calculations which need the received variable are delayed until the 
end of the survive time. 
Synchronous flag This flag controls the scheduling of communica-
tion activities. If this flag is false for a certain communication activity, 
the scheduler can put the send activity before the receive activity of the 
same communication pair. No 'crosses' are allowed, however (see Fig. 9). 
If the synchronous flag is true, the send and receive activities are scheduled 
strictly at the same time. 
S2 
SI 
THE RAF.4.EL .\fL"LTI- TARGET 
R2 
RI 
S2 
SI 
RI 
R2 
Valid non-synchronous comnrunication 
activity arrangement 
Invalid arrangement ("cross") 
Fig. 9. _-\liowed and not allowed communication schemes 
8. Graph Description Language 
21.5 
The actual Rafael implementation does not contain a graph editor, the 
user must construct the input algorithm graph himself or herself. A simple 
graph description language is used for this purpose which will be described 
briefly in this section. 
According to the two software models in Rafael. there are two varia-
tions of the graph description language. In the first variation (synchronous 
dataflo,,:..-) only probes, nodes, delays and connections are allowed. Let us 
see an example graph: 
PROBE I 1 1 LTYPE 1 1 
PROBE I 2 1 A.TYPE 1 1 
PROBE 0 7 1 
NODE 4 ADD (4) 
NODE 5 ADD (4) 
NODE 6 ADD (4) 
IWDE 8 NUL (4) 
HODE 3 CDlJST ((1 2 3 4» 
DELAY 9 4 1 
CONNECTION L1 4_1 
CONNECTION 2_1 4_2 
CONNECTION 2_1 5_1 
CONNECTION 3_1 5_2 
CONNECTION 4_1 6_1 
CONNECTION 5_1 6_2 
CONNECTION 6_1 8_1 
CONNECTION 3_1 9_1 
CONNECTION 9_1 8_2 
CONNECTION 8_1 7_1 
216 G. PALLER and f.:. CSEFALVA)' 
PROBE <I/O> <nodenum> <type> <upsample> <dO\vnsample> 
<I/O> is the input/output probe type, <nodenum> is the number of 
the node, <type> is its type name. For convenience of the compiler, 
Rafael stores the relative sample rate of the node in rational form. 
<upsample> is the nominator, <downsample> is the denominator of 
the relative sample rate (see section 11). 
NODE <nodenum> <operation> <parameters> 
<nodenum> is the node number, <operation> is the function at-
tached to the node, <parameters> is the parameter list which de-
pends on the function. In the case of the example ADD operator 
determines the size of the vectors to be added. 
DELAY <nodenum> <delay size> <delay length> 
<nodenum> is the number of the node, <delay size> is the size of 
one token it stores, <delay legth> is the number of delay stages data 
fed into the delay goes through. Delays explicitly have TYPE in-
puts/output types. 
CONNECTION <onode>_<onum> <inode>_<inum> 
Defines a connection between the output numbered <onum> of the 
node having <onode> node number and an input described by similar 
parameters. 
The conditioned block dataflow model allows block definitions beside 
the elements above. In this model only probes, block definitions and con-
nection definitions are permitted at root level. 
BLOCK NADD2 I1->6_1:TYPE1 12->5_2: 
13->5_1 :TYPE1 01->6_1: 
NODE 5 HUL (4) 
NODE 6 ADD (4) 
CONNECTION 5_1 6_2 
ENDBLOCK NADD2 
TYPE1 
TYPE1 
BLOCK !1UL2 C:BOOL I1->6_1:TYPE1 12->5_2: TYPE1 
13->5_1 :TYPE1 01->6_1: TYPE1 
NODE 5 HUL (4) 
NODE 6 !1UL (4) 
CONNECTION 5_1 6_2 
ENDBLOCK HUL2 
PROBE I 1 1 A_TYPE 1 1 
PROBE I 2 1 A_TYPE 1 1 
THE RAF'.A.EL ~Hr..;LT!~ TARGET 21, 
PROBE I 3 1 A-TYPE 1 1 
PROBE I 10 1 BOOL 1 1 
PROBE 0 7 1 
NODE 4 HADD2 
NODE 5 l1UL2 
CONNECTION 10_1 5_C 
CONNECTION Ll 4_1 
CONNECTION 2_1 4_2 
CONNECTION 3_1 4_3 
CONNECTION Ll 5_1 
CONNECTION 2_1 5_2 
CONNECTION 4_1 5_3 
CONl!ECTION 5_1 7_1 
The only new element is the BLOCK ... END BLOCK definition 
pair. Blocks group their internal nodes into one virtual operator that can 
be placed by a NODE definition. A internal node in a block is identified by 
its block name and node number, two blocks can have internal nodes with 
the same node number as internal nodes are i~visible outside of a block. 
The block header contains the following elements: 
I <inputnum> - ><inp nodenum>_<inp inputnum>:<typename> 
Connects <inputnum> input of the virtual operator represented by 
the block to <inp inputn·~m> input of <inp nodenum> internal node. 
Type of the block's input is set to <typename>. Data fed into that 
input of the block will be propagated to the internal node's input. 
o <onum> - ><onodenum>_<out outputnum>:<typename> 
Connects <onum> output of the virtual operator represented by the 
block to <out outputnum> output of <onodenum> internal node. 
Type of the block's output is set to <typename>. Data produced 
by that output of the internal node will be propagated through the 
output of the virtual node. 
C : <typename> Indicates that the block has condition input and the 
type of the condition input is <typename>. Condition input can be 
referenced as 'C' in the CONNECTION definition. 
9. The Database 
Rafael provides a programmable operation and hardware database stored in 
Lisp. The database is accessed by the compiler core through Lisp functions. 
The interface of these Lisp functions is documented so that the database 
programmer can interface to the compiler core. 
218 G. P.'!'LLER and X. C.5EFALVAY 
The database consists of two parts: operation database and hardware 
database. Operation database stores the actual function set for all the sup-
ported hardware devices while hardware database provides Lisp functions 
that can calculate every characteristic of the target hardware system which 
is necessary for scheduling and code generation. 
The database is handled and maintained through the XLisp inter-
preter and stored in Lisp lists. Because XLisp runs under ·Windows, all 
its memory is virtualized so we can store the whole database in the mem-
ory of XLisp. The simplifies greatly the implementation of the database 
management because we simply use the built-in list manipulating functions 
of LISP. 
The Operation Database 
The operation database has tvlO parts: operator headers and compilation 
strategy functions. The operator headers are stored in lists which are bound 
to the operator name. This list stores the following information: 
The name of the compilation strategy routine. 
The description of the input(s) (type, size). 
- The description of the output(s) (type, size, storage class, sample rate 
factor). 
The execution time in system clock beats. 
Parameters. The parameters and their meaning are defined by the 
creator of the operator library. For example the parameters for the 
FIR operator can be the length of the filter and the filter coefficients. 
The actual values of the parameters are supplied when the user places 
an operator, it is passed in the SFG script. 
Constructor and destructor routines. The compiler creates a construc-
tor function for each operator \vhich requests it. The constructors are 
invoked before the operator is executed first time. Similarly, before 
the SFG execution terminates, destructor functions are called for the 
operators which need it. 
The data structure above is described in a list like the following: 
strategy list) 
inputs ) 
outputs ) 
time function 
parameters ) ) 
constructor strategy list ) 
destructor strategy list ) 
THE RAFAEL .\fr..;LTI~TARGET 219 
The strategy list contains the names of the compilation strategy func-
tions for each hardware device. It has the following format: 
(devicel functionl) (device2 function2) 
. .. (deviceN function!!) ) 
The compilation strategy function is called each time during the code 
generation pass when the schedule contains a reference to that function 
and its program text must be generated. This LISP function gets the label 
lists of the input and output branch descriptors (effectively labels of data 
areas where the compiler allocated space for the temporary variables), the 
parameter list (which contains data like coefficient vector of a filter, etc.) 
and returns the program text to the compiler which writes it into the 
output file. The strategy function can decide on the subroutine chosen 
or the form of the generated program text depending on the input and 
output connections and the actual parameters. The subroutine bodies can 
be stored in an ordinary object library, in this case Rafael will place only 
references into the code which can be resolved by the linker which belongs 
to the DSP's development system. This subroutine library can be created 
and maintained by the assembler and library manager tools of the DSP 
development software package. Another design style is to in line all the 
operation bodies which result in slightly faster code but larger code size. 
The excellent symbol handling capability of the LISP which makes 
this language so appropriate for the artifical intelligence applications can 
be exploited in this system and 'we can build significant intelligence into 
the strategy functions. 
The input list stores the description of the operator's input. Its format 
is the following: 
( typel sizel 
(typeN sizeH) 
(type2 size2) ... 
where type is the freely chosen signal type (for example time for time 
domain signals) and size is the size of the input vector accepted by this 
node. This size can also be a symbol from the parameter list (for example 
the size of an FFT input can be N where N is a parameter supplied by the 
SFG designer) or even a lambda function of the parameters. The type 
name can be either static or dynamic. Dynamic type names have the form 
of 'TYPEn' where n is an integer number. Dynamic type names are resolved 
when they are connected to a statical one. 
220 G. ?ALLER anc .':..:. CSEFALVAY 
The output list is similar, but beside type and size it also contains 
the storage class specifier and the upsample and downsample factors. Its 
format is the following: 
( typel sizel stl usl dsl) 
(type2 size2 st2 us2 ds2) 
(typeN sizeN stN usN dsN) ) 
The storage class specifier shows whether the compiler has to allocate 
space for the output variable or the space is reserved by the operator. 
The us and ds values describe the change in sampling frequency caused by 
the operator. The us denotes the multiplication, ds is the division of the 
sampling frequency. For example the pair 2 1 means interpolation by 2. 
The time function list stores Lisp functions which get the bound pa-
rameter list and return the execution time of the operator on a given hard-
ware. The list has the following format: 
devicel lambdal 
deviceN lambdaN 
( device2 lambda2 ) .,. 
) 
where lambdal ... lambdaN are lambda expressions (no-header Lisp func-
tions) which compute the execution time for the given device. 
The parameter list contains operator-dependent data. For example 
in the case of an HR filter it contains the size of the nominator and de-
nominator coefficient vectors and the vectors themselves. In the operator 
header the list is stored in unbound form (without parameter values), the 
editor evaluates this list when placing an operator. The HR parameter list 
would look like the following in unbound form: 
(N COEFl M COEF2) ) 
and in bound form (after the operator has been placed) 
(3 (0.34 - 0.2 2.12) 4 (0.23 0.77 0.192 2.94) ) 
This bound form is stored in the SFG description file and is passed 
to the execution time computing and strategy functions when necessary. 
The constructor and destructor strategy lists have the same format as 
the strategy function. An operator may have constructor and/or destructor 
functions pieces of code which are executed before the operator's first 
run and after the operator's last run. If the operator does not need such 
functions, NIL is stored instead of the name. 
The following small code piece shows the implementation of the ADD 
database entry for the TMS320C30 and DSP96002. 
THE F{.-1FAEL '\fr...-LTI- TARGET 
(setq add ' (( 
c30add is C30 strategy function 
(c30 c30add) 
dsp96kadd is 96K strategy function 
(dsp96k dsp96kadd) 
) 
Has two inputs, each of size n 
(n is the operation parameter) 
((typel n) (typel n) ) 
Has one output, size n, automatic storage, 
interpolating factor: 1 
( (typel n all) 
Time functions for C30 .. , 
and 96K 
( (c30 (+ (* 2 n) 10) 
:dsp96k (+ (* 2 n) 5) 
) 
Has only one parameter (n) 
( n ) 
No constructor for C30 and 96K 
( (c30 nil) (dsp96k nil) 
No destructor for C30 and 96K 
( (c30 nil) (dsp96k nil) 
Target Hardware Database 
221 
The target hard'ware database provides the following information to the 
compiler core: 
- Processor numbers and processor types in the target system. 
Activity, survive times and synchronization flag for any communica-
tion activity. 
- Communication cost estimation for any communication path in the 
target system (for the scheduler). 
Channel-processor pair assignment for any processor pair. 
A set of Lisp functions must be written for each target system. It IS a 
relatively inconvenient solution but allows greater flexibility. 
222 
10. Rafael Memory Management 
Rafael allocates memory for temporary variables in compile time. When 
the generated program runs on the target system, every variable is already 
assigned a memory address. Rafael implements a simple 'first fit' dynamic 
memory allocation scheme when compiling the graph. 
vVhen a node is scheduled, Rafael allocates its output variables (the 
input variables must have already been allocated). The scheduler keeps 
track of the actual state of memory map by the means of chunk lists w'hich 
describe, actually what size of blocks are occupied at what address in the 
memory of the target processor. \iVhen allocating a variable the memory 
manager simply walks this chain and finds the memory block with the 
lowest address which is big enough to accommodate the variable to be 
allocated. 
\iVhen an output variable is created, its 'scope' is established. A vari-
able goes out of scope if all the operations that consume this variable hase 
already been executed. In this case the memory chunk assigned to the 
variable is freed and the place the variable occupied can be reused. As the 
scheduler cannot know when allocating the variable, on which processor(s) 
that variable will be consumed. every instance (variable sent to other pro-
cessors) of that variable stays 'alive' on every processor until all operations 
that consume that variable terminate. 
A variable can be local or globaL Local variables are used internally 
by blocks. A variablte is local if it is created in a block not at root level and 
it is consumed only by the operations of that block (so it is not connected 
to a block output). Every other variable is global. Blocks have their own 
address maps that start at relative address O. At the end of the scheduling 
when we know, how much memory is required fir the global variables, local 
variable addresses are relocated so that these variables be allocated starting 
at the end of the memory allocated for global variables. Local variables of 
blocks thus overlay each other (Pig. 10). 
11. Compiler Passes 
Rafael compiler works in 5 passes. 
Reading Graph Description File 
The compiler reads in the SFG file and parses it syntactically. Then it anal-
yses the connection definitions and signals connection errors (connecting to 
THE RAI::-'AEL .\!L·LT!~TARGET 
~>I3 V:u2 Varl 
Memo!), Iilllp for Block A 
~ ~ 
Memo!)' map for Block B 
~ar3 V:u2 Vul 
Memory map for root block 
Memo!)' ITUp for Block A Memory map for Block B 
memory top 
Vu3 
V:u2 
V:u2 
Varl Vul 
Vu3 
V:u2 Root block variables 
Varl 
base add.ress 
Final memory ms.p 
Fig. 10. Block memory Q\'erlaying in Rafael (s\lpposing 1 processor) 
223 
nonexisting node, nonexisting input, etc.). During this phase the compiler 
rebuilds the tree in the memory of the computer, ready for analysis. 
Type Check:ing 
The compiler resolves the dynamic type names and checks if there are 
type errors (see section 4 for further explanation). The type checker is a 
recursive routine that propagates the stat~c type names from node to node 
substituting dynamic type names with static ones and signaling errors if 
type name violation is found. The type checking starts at descendants of 
probes as they are the only nodes that surely do not have dynamic types. 
IP F Checking 
IPF stands for interpolation factor and is used to support Rafael's multirate 
features (section 6). IPF is the rate of the node's execution in the multirate 
224 
model. IPF is represented by two distinct numbers, the nominator and the 
denominator so IPF:1.4 means 1/4 execution rate. 
Rafael uses a recursive subroutine similar to the typechecker to prop-
agate IPFs along the graph and looks for the minimal IPF factor. Propa-
gating IPF means that the IPF at the input of the operation is multiplied 
by the sample frequency multiplication factor stored in the database at the 
output description yielding output IPF then it is passed to all the nodes 
connected to the outputs. The actual implementation of Rafael prescribes 
that the output sample on all the outputs should be the same. During 
the IPF propagation the minimal IPF in the graph is recorded. As IPF is 
calculated by division or multiplication by integer factor, all IPFs in the 
graph must be integer multiple of the minimal IPF. So the factor 
IP Fnodc 
Cioop = 
IPFmin 
is the loop count that determines, how many times an operation with IPF 
I P Fnode must be repeated if the minimal IPF is I P Fmin. Note that oper-
ation changing IPF are always executed on the higher input sample rate 
and output sample rates (Fig. 11). 
Nl~N2~~N3 
IPF: 1/4 1/4 5/4 
Loop 4 times 
Loop 5 times 
Fig. 11. IPFs in an example graph and looped schedule 
THE RAFAEL .\fULTJ·TARGET :22.5 
~ 
_____ ~ _ w __ wy ___________________ _ 
ASAP ALAP 
Fig. 12. ASAP and ALAP schedules 
Scheduling 
The formally correct, typechecked graph \vith IPF values for all the nodes 
calculated is then passed to the scheduler algorithm. The actual version of 
Rafael contains only the RHLS scheduler but work is under way to imple-
ment the much more efficient Springplay schediIler (P.ULER \YOLI:\SKI, 
1995) in the software. 
RHLS is an ALAP-based list scheduler which was made suitable for 
heterogeneous environment. In the first step we create ASAP and ALAP 
schedules in order to get the ALAP levels. 'vVe present briefly ASAP and 
ALAP schedules below. 
ASAP algorithm was presented first in Hc's classical publication (Hc, 
1961). ASAP scheduler starts operations as soon as all the predecessor 
nodes terminate the computation that is 
E(ni) = max(E(pred(ni))) + t~,;asap, (1) 
where t~,·il.'(!P is the execution time of node i and E( ni) is the earliest time 
when ni can be executed. Node with no predecessors have E = O. This 
simple version is only for homogeneous architectures. The original version 
supposes unlimited resources and schedules nodes just at their E. 
ALAP schedule is based on very similar principles. Nodes are sched-
uled as late as possible without increasing the length of the schedule. 
(2) 
L( nz) is the latest time when ni can be executed in the case of minimal 
length schedule. L values of nodes with no successors are initialized to the 
maximal E value over the entire graph. Fig. 12 depicts the ASAP and 
ALAP schedules of an example graph. 
226 G. PALLER and h-. CSEP:1LFA}" 
RHLS assume the we can always schedule the nodes on the fastest 
processor possible so minimum execution time is supposed when building 
the ASAP-ALAP schedules. 
te.asap _ . (t-e ) ni - mIn n , 
where t; is the execution time vector that is composed of execution time 
of node n on each processor. Then we define urgency of the operation n 
like the following: 
(3) 
where tr is the virtual time and it will be detailed later. 
The base of the scheduling heuristic is to assign the nodes on the 
critical path to the fastest processor available. The more urgent it is to 
execute a node (as its delaying \vould set back the execution of the whole 
graph) the faster processor it deserves. The most urgent nodes are those 
which have the lowest ALAP time. 
vVe pick hence the node to be scheduled based on theuni urgency value 
defined above (lowest urgency value means more urgent node) and we need 
the best processor to execute it. The best processor selection is very simple: 
we try the node on each processor considering the communication costs and 
we pick the one on which the node achieves the earliest completion time. 
Before trying a node on a processor, necessary communication activities are 
scheduled tentatively so that we know how much time must be calculated 
for fetching the input variables produced on other processors. 
The heuristic algorithm works like the following: 
Create the ready node list from nodes that have no predecessors; 
while the ready list is not empty do 
for all nodes do 
if u(~.) < minimum so far 
Candidate = node i; 
end for 
Try the candidate on each processor considering communication 
cost; 
Choose the processor on which the task achieves the earliest 
ending time; 
Schedule candidate node and the necessary communication 
activities on candidate processor; 
Update u(i)s and tv; 
Add nodes that become ready to the ready list; 
end while 
As the real tni node starting times will generally not be equal to the 
ideal ASAP or ALAP starting times the scheduler maintains real processor 
THE llAFAEL .lfULTJ·TAllGET 22T 
times and tl.' virtual time. The virtual time is used to track the time in 
the ALAP schedule graph while the real time is the scheduling time on 
the processors. The tl.' variable shows where we are in the ALAP schedule 
graph, it is set to the lowest ALAP time among the ready nodes. The last 
step is the updating of urgency and virtual time variables. 
The version implemented in Rafael differs from the algorithm pre-
sented above considers node repetition resulted by multiple sample rate 
loops (see IPF checking section). The schedulers consider effective node 
execution time as Cl oop . t~iasap and try to group nodes with the same IPF 
together. 
Code Generation 
The scheduling done, Rafael generates the output text for each processor. 
The code generator walks the activity list on each processor then asks the 
Lisp code generator database functions to produce output text for them 
which is then sent to the output file. Separate' output files are generated 
for each processor. The model of output text will be discussed in detail in 
the next section. 
Cude Generation l1;lodel 
Rafael has a parametriza ble code generation that allows each section of the 
text generated to be redefined. The code generator invokes Lisp functions 
that receive the parameters of the text section and the device for which 
the code will be generated then it is the responsibility of these Lisp func-
tions to produce the appropriate text. These code pieces are called code 
generator service functions and they complement the operation strategy 
routines. Every text section that Rafael writes to the output text file can 
be redefined by modifying either the operation strategy functions (in the 
case of operation texts) or the code generation service functions (headers, 
communication routine codes, etc.). 
Rafael generates three text sections for each processor (that may be 
empty as well). For programmable processor-like devices, that Rafael was 
designed for, the database programmer may wish to realize these three 
sections as subroutines. These sections are the following: 
1. Constructor section. Called only once from the user program before 
the first iteration of the datafiow computation. 
228 G. PALLER and f.:. C5EFALL4}' 
2. Operation section. Called once for each iteration. Calling the opera-
tion section entry label will actually execute the program generated 
from the SFG. 
3. Destructor section. Called once after the last iteration of the opera-
tion section. 
Each section has a start and end header that probably contain section 
head label in the start header and 'return' instruction in the end header. 
The sections contain the text generated by the operation constructor, strat-
egy and destructor functions. 
If the compiled SFG \vas written in block conditioned model, each sec-
tion has a separated part for each block. In the constructor and destructor 
sections it is rather a formality as Rafael guarantees no specific order among 
the operators when it generates constructor and destructor sections. In the 
operation section each block has a start and end header. The current op-
eration library realizes blocks as subroutines so the start header defines a 
block entry point label and the end header contains a 'return' statement. 
The block subroutine contains the operation body texts in the schedule 
order. Having block subroutines generated, Rafael emits the text for the 
root block that contains probe calls and block invocations. Block invoca-
tions in the current operation library result in subroutine 'calls' to block 
subroutines. 
12. Conclusions 
Rafael cannot compete in complexity with the most advanced systems 
partly because of the limited capabilities of the host computer we chose, 
partly because of the significantly less human resources we could devote to 
the project. The: f:nal product, the compiler itself has been implemented 
but many support programs that \vould make its usage convenient have 
not even been planned. For this reason the actual Rafael system is not 
so 'user-friendly'. As all the resources were concentrated on the compiler 
development, important parts of the system have not achieved the neces-
sary level yet. The most important among them is the operation database 
that contains only about a dozen operations only for the TMS320C.30 and 
DSP96002 DSPs. A brave user of Rafael must face the immediate task 
of filling up the database which requires Lisp programming. Lisp is con-
sidered a difficult language among the users although the simple functions 
needed by the compiler core should be easy to implement for a bit more 
experienced programmer. 
Two distinct influences can be discovered in the Rafael design. The 
first one is Lee's synchronous dataflow approach and the Gabriel system 
THE RAFAEL MULTI-TARGET 229 
which gave us the first notions, how Rafael should look like. \Ve quickly 
faced, however, the need of run-time decisions and the difficulties it causes 
in a system based on synchronous dataflow. The second influence that 
we embedded into Rafael was the way the synchronous language compilers 
work and SynDEx transformes their output to distributed code. Critique of 
the SynDEx approach was given and a model that was easy to implement 
to an existing synchronous dataflow system was developed and realized. 
Limits of this model were pointed out but we consider that in many practi-
cal cases, notably in the DSP case they are acceptable. Further researches 
are conducted to find a better way for handling dynamic structures in a 
dataflow system. 
So Rafael project achieved its aims at the following points: 
A flexible multi-target SDF compiler has been realized on PC plat-
form. 
Effective scheduling algorithms have been developed for the hetero-
geneous case. 
Rafael still has a long way to go at the following fields: 
More user-friendly environment (graph editor, database editor tools, 
etc.). 
Complete database for various DSP processors. 
Better communication model. 
References 
A:.IAGBEG:\O:\. T. BES:\ .. H,D. L. - LE GCER:\iC. P.: Arborescent Canonical Form of 
Boolean Expressions. INRJA Research Report. :\0. 2290 June. 199-1. 
1:3l!.-\TTACHARYY,\. S. S. - LEE, E. A.: .\lemor:; .\Ianagement for Dataflow Programming 
of .'vlultirate Signal Processing Algorithms, IEEE Transactions on Signal Processing. 
\-01. -12, :\0. ·5. pp. 1190-1201, .'vIay 1994. 
BUl:\_-\!. P. L.\\,ERUH. C. LE GeER:\IC, P. .'vl.\FFEIs. O. SORSL. Y. (1994): 
Interface SIG:\.·\L·SynDEx, I.\'RIA RESEARCH REPORT.:\o. 2206. 
BOl'SSI:\OT. F. Snro:;s, R. (1991): The ESTEREL Language, Pr-iceedings of IEEE, 
\-01. ,9, :\0. 9. pp. 1293-1303. 
Ben: .. J. T. Eh. s. LEL E .. c\.. MESSERSCH'.llTT, D. (1991): .'vIultirate Signal 
Processing in Ptolemy, Proc. IEEE ICASSp·91 Toronto, Canada, April 1991. 
BecK, J. T. (1993): Scheduling Dynamic Dataflow Graphs with Bounded .'vIemory Csing 
the Token FDlow .'vIodel. Ph.D. dissertation, C niversity of California at Berkeley. 
BeCK. J. T. - HA. S. LEL E. _-\. - ivIESSERSCIl:.!ITT, D. G. (1994) .. -\ Framework for 
Simulating and Prototyping Heterogeneous Systems. International Journal of Com-
puter Simulation, special issue on Simulation Software Development, January, 1994. 
GUI:;, D. - HILFI:\GER. P. RASAEY, J. - SCHEERS. C. DE '\IA:;, H. (1990). DSP 
Specification Csing the Silage Language, ICASSP·90. pp. 10-57-1060. Albuquwrque, 
April. 1990. 
2:30 
H.-UBWACHS.::\. CASP!. P. RAn!O::D. P. PIL.UD. D. (1991): The Synchronous 
Data Flow Programming Language LCSTRE. PTOceedings of IEEE. \'01. 79. ::\0.9. 
pp. 130.5-1:319. September 1991. 
L.o,\\L\R. D. (19(r3): Design :'Ilodels and Data-Path :\lapping for Signal Processing .-\r-
chitectures. Ph.D. dissertation. Eatholike l'-ni\'ersiteit Leu\·en. :\larch 199:3. LL\RY. 
1(. W. WADDI\GTO\, \Y. (1990): DSP/C: A Standard High Ln'el Language for 
DSP and ::\umeric processing, IEEE ICASSP-90. pp. 106S-1068, Albuquerqu. ::\ew 
:\lexico. April 1990. 
LEE, E. .'\. :'IIESSERSCIl:l!lTL D. G. (1987): Szazic scheduling of Synchronolls Data 
Flow Programs for Digital Signal Processing. IEEE Tmn::'acti01l:' on Computers. 
pp. 2.5-:3.5, \"01. C-:36, ::\0. 1. .January 1987. 
LEE, E. A. Ho. W-H. GOE!. E. E. BIER . .1. C. .. B!L\TTACHARRY':A. S. (1989): 
Gabriel: A Design Environment for DSP. IEEE TrailS. on Acoustics. Speech and 
Signal Processing. \"01. :37. ::\0. 11. ::\ O\'em ber 1. 989. 
LE GCER\IC. P. GACTIER. T. LE BORG\£.:\t. LE :\L\IRE. C. (1991.): Program-
ming Real-Time Applications with SIG::\AL. Proceedings of IEE8, \'01. 79. ::\0. 9. 
pp. 1321-1:3:36, September 1991. 
Hc. T. C. (1961): Parallel Sequencing and Assembly Line problems. Opu. Rcs .. \'01. 9. 
pp. 841-818. ::\o\'ember 19b1. 
DE .'-lA\". H. R.·\BAEY . .1. SIX. P. CL.o,£5E:'. L. (1986!: Cathedral-II. A Silicon 
Compiler for Digital Signal Proccs;-;ing. IEEE /)e:;ign @ test. pp. 1:3-2-1. December 
1986. 
:\IESSERSCH:.IITL D. G. (1954): .-\ tool for struCTured functional simulation. IEEE J. 
Selected Areas of Comnw.nicnt.icJ7!. \'01. SAC'- 2. Jan. 19><4. 
PALLER, G. - WOLI;\SKl. C. (199.)): .. \ ::\ew Class of Compile-Time' Scheduling Algo-
rithm for Heterogeneous Target :\rchitectures. IFAC/IFIP Wori.,,,hopo7l Real Tinu: 
Programming. Fort Lauderdale. \oH'Illher 199·). 
SOREL. Y. (1994): :\lassi\"(~l\' Parallei COf!lputiw~ Real 'limp Constraints 
The Algorithm .-\!"chitcctul'e .\dcqlwtiofl FaT"llIlc! 
I:-:chia. ),Iay 199-L 
