For the design of complex digital signal processing systems, block diagram oriented synthesis of real time software for programmable target processors has become an important design aid. The synthesis approach discussed in this paper is based on multirate block diagrams with scala.ble synchronous dataflow (SSDF) semantics. For this class of dataflow graphs we present scheduling techniques for optimum data memory compaction. These techniques can be employed to map signals of a block diagram onto a minimum data memory space. In order to formalize the data memory compaction problem, we first derive appropriate implementation measures. Based on these implementation measures it can be shown that optimum data memory compaction consists of optimum scheduling as well as optimum memory docation. For the class of single appearance (SA) block diagrams with SSDF semantics, scheduling can be reduced to an integer linear programming (ILP) problem. Due to the computa.tioiial complexity of ILP, we also present a suboptimum scheduling selection criterion, which can be used for SA and non SA-schedulers.
INTRODUCTION
Memory compactmion is an important optimization technique for systems with memory resource constraints. Especially in the implementation of digital signal processing systems we often find memory constraints due to limited available on-chip memory of programmable target architectures.
In this paper we focus on data memory compaction in the context of the synthesis of real time software using a block diagram specification of a signal processing system. As target processors digital signal processors (DSPs) are of special interest because of architectural features tailored to the specific needs of signal processing tasks. DSPs have stringent on-chip program memory limits and off-chip memory access is in general inefficient. In case of commercially available DSP cores, the on-chip RAM/ROM memory space may be adapted to the application specific needs. Since the size of the required memory space determines the costs, it is important to minimize the required memory space.
The block diagrams used for software synthesis are dataflow oriented and consist of blocks and signals. From the implementational point of view blocks are software modules (supplied by the user or the system) and signals are FIFO buffers in the data memory space. In case each block of a block diagram consumes and produces a fixed number of data samples, the block diagram is based on the "synchronous data flow" (SDF) [l] paradigm. These numbers, called rates in the sequel, inust be specified a. priori. e.g. a t configuration time. Up-or downsampling wit,liiii a block results in multi rate block diagrams. If all blocks of a block diagram may consume and produce any integer multiple of the predefined SDF-rates p e r activation, we call the SDF graph scalable, resulting in a scalable synchronous dataflow (SSDF) graph [Z]. Due to t,he scalability, SSDF block dia-
2651
grams can be optimally vectorzzed [3] . Vectorization in the context of SSDF graphs is regarded as a transformation on an SSDF graph raising the number of consumed and produced samples per activation to a certain integer multiple of the predefined SDF-rates. Because of the instruction and/or arithmetic pipelining of DSPs, vectorization leads to enhanced throughput of the synthesized software. Because of the increased vector lengths, vectorization also increases data memory consumption, which can be drastically reduced by the proposed data memory compaction.
The heuristic minimization of data memory consumption by means of looped schedules for SDF block diagrams has been discussed in [4]. The approach therein minimizes the vectorization opportunities for each block, thus is applicable for applications where throughput is not the primary optimization goal.
In section 2 we will introduce some of the basic formalisms of SSDF graphs. It follows the presentation of some implementation measures which serve as optimization criterions. In section 4 the optimum scheduling problem is treated. Afterwards we derive a suboptimum scheduling heuristics and finally demonstrate the scheduling strategies by means of an application example.
SCALABLE SYNCHRONOUS DATAFLOW:
We suppose that a digital signal processing system is specified by means of a scalable synchronous block diagram In this paper we are interested in infinite schedules, where each block is activated infinitely often. For the software synthesis, we have to guarantee that an infinite valid schedule can be implemented with finite memory space for each signal. A sufficient condition for finite memory is that each block b j of a scalable synchronous block diagram F is executed at least q~( b j ) times for one schedule period 9 [l], which can be repeated infinitely often. This condition ensures that in each schedule period as many samples are written to as are read from each signal. For multirate block diagrams for a t least one block q~( b j )
The vectorization of a given scalable synchronous block diagram determines the local blocking factor n(bj) for each block b j . Since there is a unique local blocking factor for each block in case of SA-schedules, vectorization can also be regarded as a transformation on F increasing the rates of the input and output ports of the blocks. The vectorization is valid, if after vectorization there still exists a valid schedule. For all blocks the minimum number of executions q~( b j )
per schedule period can be increased by the global bloclung factor N g , which describes the global vectorization degree. In the sequel we assume that the block diagram has an associated SA-schedule and is vectorized, i.e. each block has a local blocking factor assigned. Also we restrict the discussion on flat SA-schedules, which means that we do not consider looped schedules.
IMPLEMENTATION MEASURES FOR DATA MEMORY CONSUMPTION
In order to derive optimum strategies for data memory compaction we first have to define the optimization criterion. A buffer is said to be static iff the write resp. read access of the incident block occurs a t the same memory offset in each schedule period. Static buffers can be efficiently synthesized, since blocks accessing those buffers just need a constant pointer to t,he memory segment allocated for the buffers. shown that this optimization problem is a nonlinear scheduling problem. Note that the memory consumption can be optimized only by minimizing the lengths of the buffers corresponding to these signals and not by sharing memory between dynamic buffers. Dynamic buffers i.g. can not be mapped onto shared memory segments since write and read accesses are scattered to the whole buffer. In the sequel we will concentrate on optimally sharing memory for signals with &(si) = 0. Signals s; for which &(si) = 0 holds will be denoted as s : . The optimization problem is regarded as more important, since i.g. there are much more signals without initial samples, especially after transformations like retiming, where explicitly initial samples are concentrated such that vectorization is optimized [5] .
In order to take the effects of shared buffers into account another implementation measure is of interest. The number of signal samples present in all signals si with Do(si) = 0 a t schedule step k after activation a k = ( b j , n ( b j ) ) can be described with the number of live signal samples M~( l c ) , In a second step the signals of the block diagram have to be mapped onto memory segments. In fig. 2 the signals of the block diagram of fig. 1 are mapped onto memory according to the live h i e s which can be determined by the Schedule. For both schedules exactly M,,t(@l,z) memory is needed. Notice that the optimum data memory compaction is a two-step optimization problem. First a schedule has to be found which yields minimum M,,t(@) and second all signal buffers have to be mapped onto memory such that buffers optimally share memory and M,,t(@P) is the amount of data memory needed. In the sequel we derive the optimum scheduling problem. Optimum memory allocation based on optimum scheduling will be presented in a forthcoming paper.
SCHEDULING F O R OPTIMUM DATA MEMORY COMPACTION
Given a block diagram F for which an optimum SA-schedule is to be found. In order to derive the number of live samples after schedule step n we define the cost matrix r = With this cost matrix we can describe the implementation measure Matt (@):
The vector I denotes the unity vector and a ( . ) the scheduling vector a(nj = ( a 1 ,~, a 2 ,~ Since we are interested in a sequential schedule, only one block may be activated a t time k:
The last class of constraints follows from the signals of the block diagram, which can be regarded as simple precedence relations between adjacent blocks bl = A ( s ; ) , bj = E(si), since F has an associated SA-schedule:
Thus optimization criterion 4 together with the constrants 4, 13 and 14 form an integer linear programming problem, which can be solved using standard software packages.
Since an ILP is np-hard, we introduce a novel scheduling heuristics, which is derived from the above ILP.
SUBOPTIMUM SCHEDULING FOR
MINIMUM DATA MEMORY CONSUMPTION In the following we present a Scheduling criterion which decides a t time k which of the blocks to activate next. This criterion can be used for S-class scheduling algorithms which successively schedule a block depending on whether there are enough input samples available for a block. Although we restrict the discussion to SA-schedules, the presented criterion can also be used for multiple appearance schedules.
Due t,o eq. (4) a.nd (7) scheduling a block bj at step n determines the number of live samples Ml(n) and the number of out.put samples for which additional memory space has to be allocated. Thus if a t time n -1 more than one block can be activated, we have to select one of these blocks in a way that A!fUct(q) is minimized.
Given now a candidate set of blocks
which all may be activated a t time n , we can split this set into two sets. The first set of blocks includes all block whose activation does not increase the number of live Sam-
The second set Cx includes all blocks which increase the number of live samples upon their activation. i.e. M i ( k ) > Ml(k -l), k > 71:
We now regard all ICll! permut,ations of blocks of C which form a valid possib 1 e subschedule
for blocks of C and ignore the previous activations u k with IC 5 n. Then the subschedule @opt = (91, 9 2 } with
exhibits the minimum number of maximum live samples:
Since we have only regarded all subschedules @ , , , this might be a local but not a global minimum. concerning all permutations IIC 2 11 of valid schedules 9 2 , . T us a scheduling algorithm based on this heuristic minimization criterion simply has to insert each successor of the last scheduled block which can be activated into C1 or C2 and has to sort within these sets according to rule 21 or rule 22. In figure 3 an example for such a heuristic sche- next. This schedule is also the optimum one.
An example for the suboptimality of the selectioii criterion can be seen in the schedule for the block diagram of figure 1. At n = 1, C1 = { b 4 } and C2 = { b z } . Following the above selection criterion, block bq is scheduled before b 2 , which is suboptimal.
6.
A P P L I C A T I O N As an example for the derived scheduling methods, a realistic application example is given. In fig.4 , the block diagram of a mobile satellite receiver is shown [6] . 
C O N C L U S I O N
We have presented a novel optimum scheduling approach resulting in a minimum of data memory consumption for single appearance constellations. This approach has been identified as an ILP problem, driving us to define an efficient heuristic. For a realistic application this heuristic has been shown to result in a minor degradation of data memory efficiency, offering a reliable complexity estimation within a significantly reduced amount of time.
