Abstract-When designing DSP applications for implementation on field programmable gate arrays (FPGAs), it is often important to minimize consumption of limited FPGA resources while satisfying real-time performance constraints. In this paper, we develop efficient techniques to determine dataflow graph buffer sizes that guarantee throughput-optimal execution when mapping synchronous dataflow (SDF) representations of DSP applications onto FPGAs. Our techniques are based on a novel two-actor SDF graph Model (TASM), which efficiently captures the behavior and costs associated with SDF graph edges (flowgraph connections). With our proposed techniques, designers can automatically generate upper bounds on SDF graph buffer distributions that realize maximum achievable throughput performance for the corresponding applications. Furthermore, our proposed techinque is characterized by low polynomial time complexity, which is useful for rapid prototyping in DSP system design.
I. INTRODUCTION AND RELATED WORK
In the design of DSP applications, it is important to consider real-time constraints as well as optimization of hardware resources. Synchronous dataflow (SDF) [1] has been used widely as an efficient model of computation for analyzing performance and resource requirements of DSP applications that are implemented on various target architectures (e.g., see [2] , [3] , [4] , [5] , [6] ).
When describing a DSP application with an SDF graph, functional blocks and storage space for transferring data between adjacent blocks are modeled as graph vertices (actors) and edges, respectively. When mapping dataflow graph edges into storage locations, care must be taken to make effective use of limited storage locations (e.g., on-chip memory in programmable digital signal processors, and block RAM and distributed memory in FPGAs). However, reducing the storage space for transferring data between actors may result in decreased throughput due to idle time that is required to prevent buffer overflow -as buffers become smaller, the frequency and duration for such overflow-avoiding idle time generally increases, which leads to decreased throughput. The limited amounts of storage available in DSP implementation targets, and the importance of meeting real-time performance constraints motivate the goal of guaranteed, throughput-optimal buffer configuration for SDF graphs. In this paper, we study this problem in the context of FPGA-based implementation.
Traditionally, throughput analysis for SDF graphs is performed by solving an instance of the maximum mean cycle problem (e.g., see [7] , [8] ) after converting the input SDF graph into an equivalent homogeneous SDF (HSDF) graph [1] . HSDF is a special case of SDF in which the production and consumption rates are identically equal to unity for all input and output ports of all actors. These rates are in terms of data values (tokens) per actor execution (firing). Throughput analysis based on SDF-to-HSDF conversion suffers from high worst case complexity because neither the time nor space required to perform this conversion is polynomially bounded (e.g., see [9] ). This complexity arises from the nature of periodic schedules of SDF graphs, which are used for static scheduling. A periodic schedule for an SDF graph is a schedule that produces no net change in the buffer state -i.e., the numbers of tokens that are queued on the buffers associated with the graph edges. The total number of actor firings in a periodic schedule can scale exponentially even for simple classes of SDF graphs [9] . Since each actor firing corresponds to a separate vertex in the HSDF version of an SDF graph, the SDF-to-HSDF transformation process can result in similar exponential growth.
Oh [10] develops a new single appearance schedule (SAS) approach for SDF graphs to jointly minimize data memory and code memory size. While this approach provides powerful analysis for software synthesis that is targeted to single processor architectures, it does not allow for simultaneous firings of multiple actors. In our context of hardware synthesis to FPGA targets, where each actor is mapped to a separate computing unit, allowing simultaneous firings is a key feature in maximizing the achieved throughput.
Stuijk [11] develops a systematic approach for exploring throughput and storage trade-offs for SDF graphs. This approach applies methods developed in [12] for determining minimum storage requirements based on state-space analysis of buffer states. Stuijk's approach operates by first finding a minimal storage distribution, and then recursively increasing the storage space for each edge that has a storage dependency. This results in a family of buffer distribution-throughput pairs as a representation of Pareto solutions for the graph. Although this approach prunes the search space to reduce complexity, schedule simulation is still required in the search process, so again, worst case complexity is not polynomially bounded.
Wiggers [13] presents an algorithm with linear computational complexity to determine close-to-minimum buffer capacities for a given throughput constraint. However, this approach imposes a form of strictly periodic scheduling that requires a counter in every functional block, which leads to resource overhead in FPGA and other hardware-oriented implementations. Also, since Wiggers's approach assumes that execution will enter the required periodic steady-state only with the timely availability of sufficient starting tokens for every actor, it may not adequately handle irregular streaming inputs, where token arrival times are less predictable.
In contrast to the related prior work, we propose a heuristic algorithm with low polynomial time complexity that provides upper bounds on buffer requirements to guarantee throughputoptimal FPGA realizations of SDF graphs. Our approach focuses on the restricted class of tree-structured SDF graphs -that is, the input application model (application graph) must be in the form of an SDF tree. We emphasize that our algorithm is a heuristic only in the sense of the buffer sizes that are computed; in terms of achieved throughput performance, our approach guarantees optimality.
We first analyze relationships of firing patterns between actors and buffer requirements for the two-actor SDF graph model (TASM), which is a specialized form of SDF graph that we propose for efficient analysis of data communication on individual edges in a given SDF application graph. We then apply this two-actor firing pattern analysis repeatedly when traversing an application graph to determine buffer configurations that guarantee maximum achievable throughput.
In our buffer optimization scenario and our associated TASM analysis, we consider self-timed dataflow graph execution (e.g., see [14] , [15] ), which means that an actor is fired as soon as all of its input edges have enough tokens -that is, as soon as the number of tokens on each input edge e is at least c(e i ). If each actor is mapped to a separate hardware resource, and the overhead of communication and synchronization between actors is negligible, then self-timed execution leads to the maximum achievable throughput (e.g., see [16] , [15] ). Moreover, this form of execution does not require any global schedule, and therefore storage, performance, and interconnect overhead associated with implementing a global schedule is avoided.
With predominantly coarse-grained dataflow actors (e.g., digital filters, and transform computations as opposed to adders and multiplers), and streamlined implementation of dataflow edges, one can reduce the relative overhead of inter-actor communication and synchronization significantly so that selftimed scheduling becomes an effective approach. This context of coarse-grain actors and streamlined edge implementation is the form in which we explore self-time implementation and associated buffer configuration strategies in this paper.
We first present precise definition and notations related to buffer analysis of SDF-based implementations. Using these concepts, we analyze the data transfer behavior on an SDF edge by the TASM model described earlier. Based on this analysis, we develop an algorithm for buffer analysis based on the TASM model, and we show an overall design flow for applying this algorithm for efficient synthesis of FPGA implementations. The proposed algorithm is implemented in the dataflow interchange format (DIF) package, which provides a standard language and associated toolset that is founded in dataflow semantics and tailored for DSP system design [5] .
II. BACKGROUND

A. Application representation
We represent a DSP application with a dataflow graph G = (V, E), where each computational module is mapped to a vertex (actor) v ∈ V and each directed edge e ∈ E corresponds to a FIFO buffer for communicating data from the source actor src(e) to the sink actor snk (e) of e. We assume that the given dataflow model adheres to the assumptions of SDF, which require that the production and consumption rates of all actor output and input ports, respectively, are constant [1] . The SDF model is used widely in tools for DSP system design, and powerful analysis techniques have been develop for mapping SDF representations into various kinds of platforms (e.g., see [15] ).
Given an SDF edge e, we represent the associated production rate of src(e) by p(e i ), and we represent the associated consumption rate of snk (e) by c(e i ). An SDF edge e also has associated with it a non-negative delay, denoted del (e), which represents the number of initial tokens that reside on the corresponding buffer at the start of execution.
A necessary condition for executing (firing) an SDF actor v is that the number of tokens on every input edge e in of v is greater than or equal to c(e in ). While v consumes c(e in ) tokens from each input edge e in during its execution -i.e., during the execution of a single invocation or firing of the actor -it produces p(e out ) tokens onto each output edge e out .
III. TARGET PLATFORM MODEL
Since resource sharing is often avoided in FPGA implementation due to the relatively high cost of multiplexing and routing resources (e.g., see [17] ), we assume that each computational block (SDF actor) is assigned to a dedicated set of FPGA logic cells without any sharing. Integrating resource sharing considerations into the developments of this paper is an interesting direction for future work, and may be useful in cases where resources are limited compared to the amount of required computation.
FPGAs provide two ways of implementing memory space between functional blocks -such memory space can be implemented using block RAMs, which provide dedicated memory hardware within an FPGA, and distributed RAM using FPGA slices. The number of ports for reading (writing) data from (into) both forms of RAM is limited, and these limitations must be taken into account carefully for correct buffer management. In the Xilinx Virtex-II Pro FPGA, which we target in this paper, the number of ports is limited to two, and therefore, only a single pair of simultaneous read/write operations to each RAM subsystem is possible.
Our overall mapping approach therefore maps each actor in the SDF application model to a single actor (dedicated hardware resource) in the architecture model, along with a self-loop connection for that actor. Thus, we allow for concurrent execution of distinct actors, while serializing successive invocations of the same actor since such successive invocations must access the same memory ports for buffer access.
IV. TWO-ACTOR SDF GRAPH MODEL (TASM)
We assume a static buffering approach for SDF graphs, which means that for each SDF edge we allocate a fixed amount of memory space at compile time. We refer to the fixed amount of space that is allocated for an edge e i as the buffer size of e i , and we denote this buffer size by the symbol D(e i ). For real-time implementation of SDF graphs, static buffering is often preferable due to its enhanced predictability and elimination of overhead due to dynamic memory allocation.
In this section, we introduce a model called the Two-Actor SDF Graph Model (TASM). For any edge e i ∈ E in an arbitrary SDF graph G = (V, E), the TASM for e i , analyzed in terms of in SDF semantics, accurately captures the token transfer across e i as the enclosing SDF graph G executes under bounded memory. Also, TASM facilitates the formalization of our proposed synthesis approach, and its feature of computing buffer space requirements for throughput-optimal implementation.
A. Two-actor SDF graph model (TASM)
Suppose that edge e i , shown in Fig. 1 , is part of some arbitrary enclosing SDF graph G = (V, E) (i.e., e i ∈ E), and suppose that src(e i ) = v src , snk (e i ) = v snk , del (e i ) = d i , and the production and consumption rates of e i are denoted by p(e i ) and c(e i ), respectively. Suppose also that e i is assigned a pre-specified buffer size D(e i ). Then the TASM graph associated with e i , which we denote by G T i , is defined as illustrated in Fig. 1(b) . Here, v The production and consumption rates for edges in the TASM graph are set as follows.
and c(e
At any given time, buffer slots (cells in the memory that are allocated for the buffer) are categorized into two types based on whether they contain live data (filled) or whether they are available for storing new data (empty or free). The filled space in the buffer for e i is modeled by e T (i, 1 ) in TASM. Thus as G T i executes, each token on e T (i, 1 ) represents a live token in the buffer associated with e i in a corresponding execution of G. Since the source actor src(e i ) can be fired only when e i has enough free space to store all of the tokens produced by a firing of src(e i ), each firing of src(e i ) can be viewed as consuming p(e i ) free cells from the buffer space available on e i . Conversely, each execution of snk (e i ) expands the free space on e i by c(e i ) cells. Hence, the free space on e i can be modeled by the edge e T (i, 2 ) shown in Fig. 1(b) , where each token on e T (i, 2 ) during an execution of G T i represents an empty cell in the buffer associated with e i in a corresponding execution of G.
B. Modified self-timed execution (MSTE) in TASM
We use the self-timed execution model when mapping the input SDF graph into an FPGA implementation. Self-timed execution of SDF graphs can in general lead to execution periods (the patterns in which actors execute on the available resources) that are of exponential length in terms of the size of the of the graph (e.g., see [15] ). Such exponential growth of execution periods can significantly complicate static analysis. To help address this difficulty, we add an additional firing rule, which we call the MSTE firing rule: Fig. 1(b) ) of the TASM model cannot be fired if
where τ j (t) represents the number of tokens on e T (i, j ) at time t for j = 1, 2. By imposing the MSTE firing rule, we obtain a modified form of self-timed execution, which we refer to in the remainder of this paper as modified self-timed execution (MSTE).
We have empirically observed that this additional firing rule usually results in only relatively minor deviations from self-timed execution. However, imposing the rule leads to a periodic execution pattern S P that is defined by the repetition vector of G. More precisely, by S P in this context, we mean a finite-duration schedule onto the disjoint subsets, r src and r snk , of FPGA resources that are occupied by the actors v src and v snk , respectively. In other words, S P can be viewed as a mapping
where t i is the length of the schedule (the period of the periodic pattern), and v idle represents a void computation (idle resource). Note that even though S P is formulated as a fully static schedule, it is implemented using our modified form of self-timed execution -i.e., the constraints imposed by our modified form of self-timed execution lead naturally to this kind of periodic pattern in the steady state.
. This kind of periodic schedule helps to significantly reduce the complexity of performance analysis since the iterative dataflow execution is characterized by a relatively compact periodic structure.
C. Subperiods in TASM
The entire firing pattern in an iteration (i.e., a single execution in the periodic repetition) of S P can be expressed as a sequence of subperiods, where by a subperiod (SP), we mean a smaller firing pattern within S P . From the additional firing rule that we introduce in our implementation model, we are able to constrain execution so that it becomes more structured, which leads to potential for more efficient static analysis. Fortunately, the constraints imposed by the MSTE firing rule do not impose significant performance limitations, which we will demonstrate in the experiments that we present in Section IX.
An SP is defined as the time period between two consecutive breakpoints of actor execution, where the breakpoints are derived from two key conditions. The first condition, which we denote by c 1 (t), is c 1 (t) = c 1,a (t) and c 1,
where c 1,a (t) = (τ 1 (t) ≥ max (p(e i ), c(e i ))), and (6) c 1,b (t) = (τ 1 (t) < p(e i ) + c(e i )).
We say that the first condition, Condition 1, "holds" or "is true" at a given time instant θ if (6) is satisfied for t = θ (i.e., if both c 1,a (θ) and c 2,a (θ) hold). As we see from the MSTE firing rule, the source actor cannot be fired when Condition 1 is satisfied.
To introduce Condition 2, denoted by c 2 (t), it is useful to first define the following notion of inter and intra firing times of an actor. The set of inTRA firing times of an actor X, denoted by TRA(X), is defined as the set of time instants during which actor X is executing. This set can be formulated as follows.
where start(X, j) and end (X, j) are the start and end time, respectively, of actor X's j-th firing in a periodic schedule.
Similarly, the set of inTER firing times of an actor X is defined as the set of time instants during which actor X is not executing ("idle"). This set can be expressed as the complement of the inTRA firing times of X with respect to the set Z + of non-negative integers:
Condition 2 associated with the definition of breakpoints is defined as
where v T src and v T snk are TASM actors in Fig. 1 . This condition represents that breakpoints do not occur during the execution time of either actor in TASM -breakpoints occur only "between" executions of v T src and v T snk . Based on the two conditions, c 1 (t) and c 2 (t), the k-th breakpoint, denoted BP (k), is defined by
V. PROPERTIES OF SUBPERIODS IN TASM As described in Section IV-B and IV-C , MSTE leads to efficient static analysis because an execution pattern under MSTE can be decomposed into a periodic pattern, and such a pattern can be further decomposed into a sequence of subperiods (smaller patterns). A subperiod can be more precisely defined as the time between successive breakpoints. For convenience in this discussion, let the greatest common divisor (GCD) of p(e i ) and c(e i ) be denoted by g(e i ), and consider the following two mutually exclusive scenarios:
and g(e i ) = min(p(e i ), c(e i )).
Under Scenario (10), we distinguish between two different types of subperiods that occur, and we refer to these types as SP α and SP β . Each of these two types consists of a fixed number of firings of v T src and v T snk . Thus, an iteration of S P is a sequence of subperiods, where each subperiod in the sequence takes on one of two statically-known forms -SP α and SP β . The specific numbers of firings are summarized in Table I . 
Here, f SP λ (X) represents the number of firings of actor X that occur in a subperiod of type λ ∈ {α, β}. Under Scenario (11), there exists only one type of subperiod in S P . In this case, p(e i ) divides c(e i ) or c(e i ) divides p(e i ), and it follows that the numbers of firings in Table I for the source and sink actors are the same between the rows corresponding to type α and type β. In other words, under Scenario (11), SP α and SP β are identical, and thus, execution proceeds based on only one type of subperiod.
In summary, there are in general two types of subperiods to consider -SP α and SP β , and these forms are identical under Scenario (11) . Execution within S P can always be broken down into a succession of subperiods, where each individual subperiod conforms to one of these two forms. This is established by the following two lemmas. Basic notation related to TASM, which is used in our formulation of these lemmas, is summarized in Fig. 1(b) . Proofs of theorems and lemmas are omitted throughout the paper due to lack of space. 
A. Firing pattern analysis
We begin with the following lemma, which relates tokens produced and consumed by the source and sink actors, respectively, in TASM. Lemma 3 states that firing of v T snk is never delayed in a subperiod (i.e., it is not preceded by any idle time at the start of the subperiod). This is because v T snk does not need to wait for tokens produced from v T src in the same subperiod. While (τ 1 (BP (k)) tokens are always sufficient to avoid delaying v T snk during each subperiod k, v T src may be delayed (due to a value of (τ 2 (BP (k)) that is too small) if the allocated buffer size D(e i ) is not sufficient. In other words, v T src can be delayed to wait for tokens on e T (i, 2 ) that must be produced by v T snk or equivalently, v src waits until one or more firings of v snk generate sufficient empty space in the buffer shown in Fig. 1(a) . Hence, the firing pattern of v T src in a subperiod is in general a function of the allocated buffer size D(e i ).
From the second breakpoint condition c 2 (t) in (8), the size of buffer(D(e i )) allocated on e i of Fig. 1(a) can be represented by
We first derive a buffer size that is sufficient to guarantee that firings of v T src are never delayed. This derivation is general in the sense that it holds in the absence of information about the execution times of v T src and v T snk (beyond the assumption that the execution times are constant). Thus, this execution pattern analysis is useful for applications in which actor execution times are known to be constant, but whose constant values are not known exactly. Furthermore, this analysis provides a foundation for computing more tight buffer size requirements in the presence of known (constant) execution times (as we show in Section VII). Theorem 1. Suppose that we are given a TASM G T i under MSTE, and suppose that the buffer size is given by D(e i ) = max (p(e i ), c(e i ))+p(e i )+c(e i )−g(e i ).
Then firings of v T src are never delayed in any subperiod. In the next two theorems, we derive buffer size levels that are sufficient to guarantee certain kinds of firing patterns for v T src and v T snk . These firing patterns are useful in throughput analysis.
Then in any given subperiod, there is exactly one firing of v Fig. 1(a) is bounded. 
B. Saturated TASM systems
In this section, we assume that the execution times of actors are constant and known apriori, and we develop methods for throughput analysis of MSTE under this assumption.
We begin by defining some notation. Definition 2. Suppose that we are given a TASM G T i that executes under MSTE, and suppose that in each subperiod, the resource r src operates without any idle time -that is, S P (t, r src ) = v T src for all t ∈ 0, 1, . . . , (t i − 1). Then we say that G T i is source-saturated. Similarly, if r snk executes without any idle time, then we say that G T i is sink-saturated. Clearly, since the net production and consumption rates of v T src and v T snk are balanced across S P , it follows that the original SDF graph ( Fig. 1(a) ) executes at its maximum achievable throughput if G T i is source-or sink-saturated. This is summarized in the following property.
Property 1.
A TASM that is source-saturated or sink-saturated executes at its maximum achievable throughput when it executes under MSTE.
Due to the additional firing rule of MTSE (see (3)), the execution of TASM under MTSE has the following property.
Property 2. Suppose that we denote the total non-idle time of the resource r η in a type λ subperiod by
Then G T i is either source-or sink-saturated if
In particular, G T i can be neither source-nor sink-saturated under two corner cases, which we denote as corner case 1 (CC1) and corner case 2 (CC2). CC1 corresponds to the condition that the following two inequalities both hold:
Similarly, CC2 corresponds to the condition that (21) and (22) both hold:
Equation (19) means that r snk has nonzero idle time in each α-type subperiod, while (20) holds if r src has nonzero idle time in each β-type subperiod. Clearly, neither r src nor r snk is saturated in such a system. The corner cases CC1 and CC2 represent limitations in our MSTE approach since our guarantee of maximal throughput, as given by Property 1, does not apply under these cases. However, we observe that CC1 and CC2 do not apply to a broad class of practical systems -in particular, systems that contain functional blocks that perform as bottlenecks, where by a "bottleneck", we mean a block whose computational complexity is dominant over other functional blocks. For example, in the dataflow-based 3GPP-Long Term Evolution (LTE) protocol application developed in [18] , the FFT block can be observed to be a bottleneck.
In Section IX we present detailed experimental studies with three practical applications, all of which involve bottleneck actors and corresponding avoidance of the corner cases (CC1 and CC2) that prevent source-and sink-saturated execution.
VII. ANALYSIS OF SATURATED SYSTEMS
Motivated by our discussion on bottleneck actors and the practical relevance of source-and sink-saturated systems, we develop in this section a detailed analysis of throughputconstrained buffer optimization for such systems. Throughout the remainder of this section, we assume that we are working with a source-or sink-saturated TASM -i.e., we assume that the corner cases CC1 and CC2 (defined in Section VI-B) do not hold.
Definition 3. Suppose that we are given an SDF Graph G = (V, E) that executes under MSTE, and suppose that the time duration of S P (i.e., a single iteration of the periodic schedule) is denoted by t i . Then by the throughput of an actor v ∈ V , which we represent by Φ(v), we mean the number of firings of v that execute per unit time. Since q[v] firings of an actor v execute in each iteration of S P , we have that
Furthermore, by the throughput of G T i , which we refer to as the TASM throughput, we mean the reciprocal of the time duration of S P -i.e., (1/t i ).
In this section, we show how to determine an upper bound on the buffer size required to execute G T i at its maximum achievable throughput. Henceforth, we refer to the reciprocal this maximum achievable throughput as t min .
We remind the reader that although this analysis is developed for two-actor SDF graphs, the methods can be applied to arbitrary tree-structured SDF graphs, as described in Section I, by using them on each edge (and the underlying two-actor subgraph) separately and combining the results.
Property 3. Suppose that we are given a TASM G T i . From the definitions of S P , t i , t min , and actor throughput, we have that 
If the execution times of v T src and v T snk are known, then it may be possible to exploit this knowledge to relax the buffering requirements, and thereby save resources on the target FPGA device. In particular, we can reduce buffering requirements if after applying the reduced buffer size given by Theorem 2 or Theorem 3 (based on whether p(e i ) ≥ c(e i ) or p(e i ) < c(e i ), respectively), the resulting throughput given by (24) still meets the given throughput constraint.
In a given enclosing SDF graph G = (V, E), the minimum achievable iteration period for S P is given by
where
If an iteration of S P completes exactly every t min time units, then we can conclude that v btlneck is source-or sink-saturated, and the overall TASM throughput cannot be increased further.
VIII. APPLICATION TO GENERAL TREE-STRUCTURED SDF
GRAPHS
Our TASM analysis can be applied iteratively to determine buffer sizes for all edges in an arbitrary tree-structured SDF graph. This assumption of a tree-structured graph is needed to ensure that the "extra (feedback) edges" added by the TASM models for different SDF graph edges do not "interact" (i.e., introduce new directed cycles in the overall graph model). Many practical SDF graphs or subsystem models are tree-structured, including models for multi-stage sample rate conversion, and various kinds of filterbanks, as well as the JPEG and OFDM transmitter applications that we examine in Section IX.
Algorithm 1 provides a systematic procedure for determining buffer sizes for an arbitrary, tree-structured SDF graph in a way that guarantees that the achieved performance will satisfy a given throughput constraint. The output of this procedure is a buffer size function D : E → Z pos , where Z pos denotes the set of positive integers. The complexity of this algorithm is O(E), which renders the approach practical for DSP and FPGA design tools. for each e ∈ E do 6: p ← p(e); c ← c(e); 7: t src ← T (src(e)); t snk ← T (snk(e)) 8: if p ≥ c then of the system due to the high number of taps. Similarly, a discrete cosine transform (DCT) block in the JPEG encoder and the inverse FFT block in the DVB-T OFDM transmitter are bottlenecks for their respective applications. Table II shows the results of Algorithm 1 and the associated synthesis results on targeted FPGA. Based on the synthesis results for the three applications, we verified that all of the solutions operate at the corresponding maximum achievable throughput levels, which correspond to the absence of idle time in the execution profiles of the resources that execute the associated bottleneck actors. Our results are therefore consistent with our theoretical results, which guarantee throughput optimality under the buffer sizes derived from Algorithm 1.
X. CONCLUSION
We have presented a novel algorithm to provide upper bounds on FPGA buffer distributions for throughput-optimal execution of synchronous dataflow graphs that are in the form of tree-structured, directed acyclic graphs. The resulting bounds can be employed directly as buffer sizes when mapping SDF graphs into digital hardware. A distinguishing aspect of our proposed algorithm is that it has low polynomial time complexity, which makes it especially useful for rapid prototyping and for implementation of large scale or heavily multirate designs. Our work appears promising for integration into high-level design processes for FPGA-based DSP system implementation, as our experiments with the LabVIEW FPGA demonstrate.
