Prototyping scalable digital signal processing systems for radio
  astronomy using dataflow models by Sane, Nimish et al.
ar
X
iv
:1
20
4.
46
96
v1
  [
as
tro
-p
h.I
M
]  
20
 A
pr
 20
12
Radio Science, Volume ???, Number , Pages 1–14,
Prototyping scalable digital signal processing systems
for radio astronomy using dataflow models
N. Sane,1 J. Ford,2 A. I. Harris,3 and S. S. Bhattacharyya4
There is a growing trend toward using high-level tools for design and implementation of
radio astronomy digital signal processing (DSP) systems. Such tools, for example, those
from the Collaboration for Astronomy Signal Processing and Electronics Research
(CASPER), are usually platform-specific, and lack high-level, platform-independent,
portable, scalable application specifications. This limits the designer’s ability to
experiment with designs at a high-level of abstraction and early in the development cycle.
We address some of these issues using a model-based design approach employing
dataflow models. We demonstrate this approach by applying it to the design of a tunable
digital downconverter (TDD) used for narrow-bandwidth spectroscopy. Our design is
targeted toward an FPGA platform, called the Interconnect Break-out Board (IBOB), that
is available from the CASPER. We use the term TDD to refer to a digital downconverter
for which the decmation factor and center frequency can be reconfigured without the need
for regenerating the hardware code. Such a design is currently not available in the
CASPER DSP library.
The work presented in this paper focuses on two aspects. Firstly, we introduce and
demonstrate a dataflow-based design approach using the dataflow interchange format
(DIF) tool for high-level application specification, and we integrate this approach with the
CASPER tool flow. Secondly, we explore the trade-off between the flexibility of TDD
designs and the low hardware cost of fixed-configuration digital downconverter (FDD)
designs that use the available CASPER DSP library. We further explore this trade-off in
the context of a two-stage downconversion scheme employing a combination of TDD or
FDD designs.
1. Introduction
Key challenges in designing digital signal process-
ing (DSP) systems employed in the field of radio as-
tronomy arise from the need to process very large
1N. Sane was with the Department of Electrical and
Computer Engineering, and Institute for Advanced
Computer Studies at University of Maryland, College
Park, MD, USA. He is now with the Department of
Physics, and Center for Solar-Terrestrial Research at
New Jersey Institute of Technology, Newark, NJ, USA.
2National Radio Astronomy Observatory, Green
Bank, West Virginia, USA.
3Department of Astronomy, University of Maryland,
College Park, Maryland, USA.
4Department of Electrical and Computer
Engineering, and Institute for Advanced Computer
Studies, University of Maryland, College Park,
Maryland, USA.
Copyright 2018 by the American Geophysical Union.
0048-6604/18/$11.00
amounts of data at very high rates arriving from one
or more telescopes. It is also desirable to have scal-
able and reconfigurable designs for shorter develop-
ment cycles and faster deployment. Moreover, these
designs should be portable to different platforms to
keep up with advances in new hardware technologies.
However, conventional design methodologies for sig-
nal processing systems in the field of radio astronomy
focus on custom designs that are platform-specific.
Such designs, by virtue of being platform-specific,
are highly specialized, and thus difficult to retar-
get. Traditional design approaches also lack high-
level platform-independent application specifications
that can be experimented with, and later ported to
and optimized for various target platforms. This lim-
its the scalability, reconfigurability, portability, and
evolvability across varying requirements and plat-
forms of such DSP systems.
A model based approach for design and imple-
mentation of a DSP system can effectively exploit
the semantics of the underlying models of compu-
1
2 SANE ET AL.: DATAFLOW MODELS FOR RADIO ASTRONOMY DSP
tation. This facilitates precise estimation and op-
timization of system performance and resource re-
quirements (e.g., see [Bhattacharyya et al., 2010]).
Though approaches for scalable and reconfigurable
design based on modular field programmable gate
array (FPGA) hardware and software libraries have
been developed (e.g., see [Parsons et al., 2005, 2006;
Szomoru, 2011; Nallatech website; Lyrtech website]),
they do not provide forms of high-level abstraction
that are linked to formal models of computation.
We propose an approach using DSP-oriented
dataflow models of computation to address some of
these issues [Lee and Messerschmitt , 1987]. Dataflow
modeling is extensively used in developing embedded
systems for signal processing and communication ap-
plications, and electronic design automation [Bhat-
tacharyya et al., 2010]. Our design methodology in-
volves specifying the application in the dataflow in-
terchange format (DIF) [Hsu et al., 2005] using an
appropriate dataflow model. This application speci-
fication is transformed into an intermediate, graph-
ical representation, which can be further processed
using graph transformations.
The DIF tool allows designers to verify the func-
tional correctness of the application, estimate re-
source requirements, and experiment with various
dataflow graph transformations, which help to an-
alyze or optimize the design in terms of specific ob-
jectives. The DIF-based dataflow specification is
then used as a reference while developing a platform-
specific implementation. We show how formal under-
standing of the dataflow behavior from the software
prototype allows more efficient prototyping and ex-
perimentation at a much earlier stage in the design
cycle compared to conventional design approaches.
We demonstrate our approach using the design
of a tunable digital downconverter (TDD) that al-
lows fine-grain spectroscopy on narrow-band signals.
A primary motivation behind a TDD design is to
support changes to the targeted downsampling ratio
without requiring regeneration of the corresponding
hardware code. Development of such a TDD is a
significant contribution of this work. We compare
our TDD with the fixed-configuration digital down-
converter (FDD) designs that use the current DSP
library from the Collaboration for Astronomy Sig-
nal Processing and Electronics Research (CASPER)
(see [CASPER Website]). We explore trade-offs be-
tween the flexibility offered by TDD designs and their
hardware cost. A TDD is particularly useful since
our target FPGA hardware platform — interconnect
break-out board (IBOB) [Parsons et al., 2006] — does
not have the feature of storing more than one con-
figurations (also referred to as “personalities”) and
dynamically loading one of them, unlike some of the
CASPER hardware platforms of a later generation.
A single reconfigurable TDD design also simplifies
code management when compared to multiple static
designs.
We must emphasize that this paper describes a
dataflow-based design flow for prototyping radio as-
tronomy DSP systems. This approach is not re-
stricted to any particular tool or hardware plat-
form. We intend to demonstrate it by developing
a high-level DIF prototype that uses dataflow for-
malisms and generating a hardware implementation
using CASPER tools from this DIF prototype. The
proposed approach is not intended to replace the
CASPER tools. It offers enhancements to the ex-
isting CASPER design flow. However, this does not
restrict its use to only the CASPER tools.
The organization of the rest of this paper is as
follows. Section 2 describes a TDD application. Sec-
tion 3.1 describes dataflow modeling in detail, along
with some of the relevant forms of dataflow (dataflow
models) that are employed in practice. A reader who
is familiar with dataflow formalisms may skip this
section. Section 3.2 provides information about the
DIF tool, while Section 3.3 highlights some of the
relevant prior work. Section 4 explains how a DIF
prototype can be used to develop a hardware imple-
mentation. Section 5 provides a summary and our
conclusions.
2. Tunable Digital Downconverter
In the DSP literature, the terms downsampling
and decimation are often used interchangeably. In
this paper, a decimator refers to a block that simply
decimates or downsamples the input signal without
any other processing (e.g., see Fig. 1(a) and (b)).
The ratio of the sampling rate at the input of a dec-
imator to that at its output is referred to as its deci-
mation factor. A decimator is generally preceded by
an anti-aliasing filter [Vaidyanathan, 1990]. In this
paper, we refer to such a combined structure, consist-
ing of a filter and decimator, as a decimation filter
(e.g., see Fig. 2(a) and (b)). In a polyphase imple-
mentation of a decimation filter, such as the one we
use in our implementation, this structure is imple-
SANE ET AL.: DATAFLOW MODELS FOR RADIO ASTRONOMY DSP 3
mented as a single computing block [Vaidyanathan,
1990]. We refer to the system or application that em-
ploys a decimator or decimation filter, possibly with
other blocks such as mixers and filters, as a digital
downconverter, and in particular, a FDD or TDD
(e.g., see Fig. 3 and Fig. 4). The decimation factor
of a decimation filter, TDD, or FDD refers to that of
the decimator in it.
Fig. 3 shows a block diagram of a TDD appli-
cation. An 8-bit analog-to-digital converter (ADC)
receives a baseband input IF signal of bandwidth
800MHz and samples it at the sampling rate of
1.6 giga-samples/second (GS/s). The internal de-
sign of the ADC block is such that 8 consecutive
time samples, where each sample is an 8-bit fixed
point number, are output on the eight 8-bit buses
at the same clock pulse. This results in 200mega-
samples/second (MS/s) on each of the outputs of
the ADC block. Correspondingly, all the down-
stream blocks also have 8 input and output ports.
Thus, there are 8 connections between any two blocks
shown in Fig. 3 that are directly connected. We have
not shown all 8 connections in detail for the sake of
clarity and simplicity.
The TDD subsystem, identified by the dotted box
in Fig. 3, extracts a subband of the input signal
with a user-specified center frequency (Cf ) and band-
width (Bw), downconverts it to a baseband, and then
downsamples it to the Nyquist rate. For example,
Fig. 5 shows two of the possible configurations of
Bw and Cf and the corresponding frequency bands
that are extracted. The output of the TDD can be
used by the downstream DSP blocks. For example,
a possible scheme can have a TDD implementation
on the IBOB. The downstream DSP blocks may in-
clude functions such as polyphase filtering and fast
Fourier transform. These blocks can be implemented
on a different hardware. This is possible using a com-
munication link between two hardware boards that
behaves as a FIFO buffer. An Ethernet link using
10x auxiliary user interface (XAUI) ports available
on the IBOB is an example of such a link.
During narrow-band observations, the Nyquist
sampled output of the TDD will be analyzed with an
existing spectrometer. The same number of spectral
channels will thus provide proportionately greater
spectral resolution as compared to analyzing the en-
tire input bandwidth. Our TDD design supports
integer decimation factors between 5 and 12. The
choice of these values stems purely from the initial
specification of the Green Bank Ultimate Pulsar Pro-
cessing Instrument (GUPPI) [Ford and Ray, 2010].
This should be considered simply as a demonstra-
tive implementation. The approach presented in this
paper does not restrict the design in any way from
having different specifications. The valid values of
Cf corresponding to the selected Bw can vary so as
to span the entire 800MHz IF input.
As shown in Fig. 3, the TDD includes a tunable
finite impulse response (FIR) filter. If the desired
output is a baseband signal, then the FIR filter sim-
ply acts as a low-pass filter. Also, in this case, the
fork (which can be viewed as a dataflow version of
a signal splitting block) and select (which is simi-
lar to a multiplexer) blocks are configured to route
the output of the FIR filter directly to the tunable
decimation filter (TDF), bypassing the mixer.
If the desired output is not a baseband signal, the
FIR filter acts as a bandpass filter (BPF). The cut-off
frequencies for this BPF are set using the specified
parameter configuration (Bw and Cf ). In this case,
the output of the BPF is fed to a real mixer, which
translates it into a baseband signal. The local oscil-
lator, with a frequency fLO, is implemented as a nu-
merically controlled oscillator (NCO). The frequency,
fLO, is dependent on the value of Cf and Bw. The
output of the mixer is then fed to the TDF, which
downsamples its input depending upon the specified
Bw or decimation factor. We have used this scheme
in order to have a real-valued TDF output.
Such a TDD, which was originally designed for
the GUPPI at the National Radio Astronomy Ob-
servatory (NRAO), Green Bank, finds its use in the
spectrometers currently under development for the
Green Bank telescope (GBT) and 20m telescope at
the NRAO, Green Bank.
3. Background
3.1. Dataflow Modeling
Dataflow modeling involves representing an appli-
cation using a directed graph G(V,E), where V is
a set of vertices (nodes) and E is a set of edges.
Each vertex u ∈ V in a dataflow graph is called
an actor, and represents a specific computational
block, while each directed edge (u, v) ∈ E repre-
sents a first-in-first-out (FIFO) buffer that provides
a communication link between the source actor u
and the sink actor v. A dataflow graph edge e can
also have a non-negative integer delay, del(e), asso-
ciated with it, which represents the number of initial
4 SANE ET AL.: DATAFLOW MODELS FOR RADIO ASTRONOMY DSP
data values (tokens) present in the associated buffer.
Dataflow graphs operate based on data-driven exe-
cution, where an actor can be executed (fired) when-
ever it has sufficient amounts of data (numbers of
“samples” or data “tokens”) available on all of its
inputs. Typically, in DSP-oriented data flow design
environments, the execution of a dataflow graph can
be thought of as that of a “globally asynchronous
locally synchronous” (GALS) system [Suhaib et al.,
2008; Shen and Bhattacharyya, 2009].
During each firing, an actor consumes a certain
number of tokens from each input and produces a cer-
tain number of tokens on each output. When these
numbers are constant (over all firings), we refer to
the actor as a synchronous dataflow (SDF) actor [Lee
and Messerschmitt , 1987]. For an SDF actor, the
numbers of tokens consumed and produced in each
actor execution are referred to as the consumption
rate and production rate of the associated input and
output, respectively. If the source and sink actors
of a dataflow graph edge are SDF actors, then the
edge is referred to as an SDF edge, and if a dataflow
graph consists of only SDF actors, and SDF edges,
the graph is referred to as an SDF graph.
For a dataflow graph edge e, src(e) and snk(e), de-
note its source and sink actors, and if e is an SDF
edge, then prd(e) denotes the production rate of the
output port of src(e) that is connected to e, and sim-
ilarly, cns(e) denotes the consumption rate of the in-
put port of snk(e) that is connected to e.
A static schedule for a dataflow graph G is a se-
quence of actors in G that represents the order in
which actors are fired during an execution of G.
Usually, production and consumption information
— in particular, the number of tokens produced and
consumed (production/consumption volume) — by
individual firings is characterized in terms of indi-
vidual input and output ports so that each port
of an actor can in general have a different produc-
tion or consumption volume characterization. Such
characterizations can involve constant values as in
SDF [Lee and Messerschmitt , 1987] (as described
above); periodic patterns of constant values, as in
cyclo-static dataflow (CSDF) [Bilsen et al., 1996]; or
more complex forms that are data-dependent (e.g.,
see [Buck , 1993; Bhattacharya and Bhattacharyya,
2000; Murthy and Lee, 2002; McAllister et al., 2004;
Plishker et al., 2008]). A meta-modeling technique
called parameterized dataflow (PDF) allows limited
forms of dynamic behavior [Bhattacharya and Bhat-
tacharyya, 2000] in terms of run-time changes to
dataflow graph parameters. The Boolean dataflow
(BDF) [Buck , 1993] and core functional dataflow
(CFDF) [Plishker et al., 2008] models are highly ex-
pressive (Turing complete) dynamic dataflow mod-
els. We have explained SDF, CSDF, and PDF mod-
els in greater detail later in this section.
Apart from DIF, which we have mentioned ear-
lier, there are various existing design tools with their
semantic foundations in dataflow modeling, such
as Ptolemy [Pino et al., 1995], LabVIEW [John-
son, 1997], StreamIt [Thies et al., 2002], CAL [Eker
and Janneck , 2003], PeaCE [Kwon et al., 2004],
Compaan/Laura [Stefanov et al., 2004], and Sys-
teMoc [Haubelt et al., 2007]. Dataflow-oriented
DSP design tools typically allow high-level appli-
cation specification, software simulation, and possi-
bly synthesis for hardware or software implementa-
tion [Bhattacharyya et al., 2010].
3.1.1. Synchronous Dataflow
An SDF graph is characterized by its compile-time
predictability through the statically known consump-
tion and production rates, as defined above. Fig. 6
shows a simple SDF graph having actors W, X, Y, and Z
(shown as circles or vertices of the graph). Each edge
(an arrow in the figure connecting a pair of actors)
is annotated with the number of tokens produced on
it by the source actor and that consumed from it by
the sink actor during every invocation of the source
and sink actors, respectively. For example, actor X
can be fired when there are at least two tokens on
its input. Whenever actor X is fired, it consumes two
tokens from its input buffer, and produces three to-
kens onto the output buffer connected to Y and two
tokens onto the output buffer connected to Z.
3.1.2. Cyclo-static Dataflow
Many signal processing applications involve be-
haviors in which production and consumption rates
may change during run-time. In some cases, these
changes may, however, be known at compile-time.
For example, consider the CSDF graph shown in
Fig. 1(a), which has a decimator actor M in it. This
actor consumes one token from its input on each in-
vocation, but produces a token onto its output only
on every fourth invocation. This behavior has been
depicted using the varying production volumes de-
noted by [1 0 0 0]. The numbers of tokens produced
by the decimator M follow this cyclic pattern with
a period of 4. This sequence of varying produc-
tion volumes, though not leading to constant output
SANE ET AL.: DATAFLOW MODELS FOR RADIO ASTRONOMY DSP 5
rates like an SDF actor, is still completely determin-
istic and known at the compile-time. This kind of
dataflow behavior, where actors exhibit token pro-
duction and consumption volumes (in terms of to-
kens per firing on specific actor ports) that are either
constant or expressible as cyclic sequences of con-
stant volumes, is referred to as CSDF. Thus, CSDF
can be viewed as a generalization of SDF in which
token production and consumption volumes may be
different across different firings of an actor, but fol-
low cyclic patterns that are completely specified at
the compile-time.
We refer readers to [Bilsen et al., 1996] for more
details on the CSDF model. As shown in Fig. 1(a)
and Fig. 1(b), it may be possible to transform a
CSDF actor into an SDF actor. In general, when
feedback loops are present in a dataflow graph, such
a transformation may introduce deadlock, and there-
fore should be attempted with caution. Such a trans-
formation, when admissible (not leading to dead-
lock), generally has trade-offs in terms of relevant
metrics including latency, throughput, and code size.
More detailed comparisons between the SDF and
CSDF models of computation are presented in [Parks
et al., 1995] and [Bhattacharyya et al., 2000].
3.1.3. Parameterized Dataflow
Though CSDF provides enhanced expressive
power compared to SDF, it is still unable to spec-
ify patterns in token consumption and production
volumes that are not fully known at compile time. A
meta-modeling technique called PDF has been pro-
posed to represent certain kinds of dataflow appli-
cation dynamics [Bhattacharya and Bhattacharyya,
2000]. This model can be used with any arbi-
trary dataflow graph format that has a well-defined
notion of a schedule iteration. For example, the
PDF meta-model, when combined with an under-
lying SDF model, results in the PSDF (parameter-
ized synchronous dataflow) model. A PSDF graph
behaves like an SDF graph during one schedule iter-
ation, but can assume different configurations across
different schedule iterations.
The PDF meta-model supports semantic and syn-
tactic hierarchy. Syntactic hierarchy is used, as in
other forms of dataflow, to decompose complex de-
signs in terms of smaller components. On the other
hand, semantic hierarchy in PDF is used to apply
specific features in the meta-model that are associ-
ated with dynamic parameter reconfiguration. A hi-
erarchical actor that encapsulates such semantic hi-
erarchy in PDF is called a PDF subsystem. A PDF
subsystem in turn has three underlying graphs called
the init, subinit, and body graphs, which interact with
each other in structured ways. Intuitively, the init
and subinit graphs can capture data-dependent, dy-
namic behavior at certain points during the execu-
tion of the graph and configure the body graph to
adapt in useful ways to such dynamics. Intuitively,
the init graph is designed to capture parameter con-
figuration that is driven by higher, system-level pro-
cessing, while the subinit graph is designed to cap-
ture the parameter changes occurring across different
iterations of the corresponding body graph. The init
graph can be used to dynamically configure parame-
ters in the subinit graph, which, in general, executes
more frequently relative to the init graph.
To further illustrate the PDF modeling tech-
nique, we consider the application example shown
in Fig. 2(a). This example involves an FIR fil-
ter with filter taps or coefficients given by CN =
[c0, c1, . . . , cN−1] followed by a decimator with a tun-
able decimation factor of D. The values of D and
CN are set either through a higher level system or
user interface. We skip the details of this mecha-
nism for the sake of simplicity and conciseness. Such
behavior can be modeled using PDF with an under-
lying CSDF model. Such a modeling approach is
referred to as the parameterized cyclo-static dataflow
(PCSDF) model [Saha et al., 2006]. Fig. 2(b) shows
one of the possible PCSDF graphs corresponding to
the application shown in Fig. 2(a). The subsys-
tem DF is a PCSDF subsystem with its component
graphs as shown in the figure. It can be seen here
that the control actor in the DF.init graph of DF
subsystem sets the required external and internal pa-
rameters, D, and CN , respectively. This actor mod-
els the required parameter control through either a
higher level system or some form of user interface. In
this particular case, the DF.subinit graph is empty
(in general, the init, subinit and body graph do not
all have to be used for a given subsystem).
The PCSDF model allows CSDF actors for which
the cyclic patterns of token production and consump-
tion volumes can be parameterized in terms of their
periods, the actual numbers of tokens consumed or
produced in the cyclo-static sequences, or both. In-
tuitively, for a given configuration of application pa-
rameters, a PCSDF graph behaves as a CSDF graph.
However, a PCSDF graph not only models all pos-
sible parameter configurations in a given application
6 SANE ET AL.: DATAFLOW MODELS FOR RADIO ASTRONOMY DSP
but also describes how they can be changed at run-
time.
Such a model is of particular interest for mod-
eling multirate DSP systems that exhibit parame-
terizable sample rate conversions. PCSDF allows
designers to systematically explore design spaces
across static, quasi-static, and dynamic implemen-
tation techniques. Here, by quasi-static implementa-
tion techniques, we mean techniques where relatively
large portions of the associated software or hardware
structures are fixed at compile-time with minor ad-
justments allowed at run-time (e.g., in response to
changes in input data or operating conditions). A
variety of quasi-static dataflow techniques are dis-
cussed, for example, in [Bhattacharyya et al., 2010].
3.2. The Dataflow Interchange Format
To describe dataflow applications for a wide range
of DSP applications, application developers can use
the DIF language, which is a standard language
founded in dataflow semantics and tailored for DSP
system design [Hsu et al., 2005]. DIF provides an in-
tegrated set of syntactic and semantic features that
help promote high-level modeling, analysis, and op-
timization of DSP applications and their implemen-
tations without over-specification. From a dataflow
point of view, DIF is designed to describe mixed-
grain graph topologies and hierarchies as well as to
specify dataflow-related and actor-specific informa-
tion. The dataflow semantic specification is based
on dataflow modeling theory and independent of any
design tool.
Fig. 7 illustrates some of the available constructs
in the DIF language along with the syntax used
for application specification. More details on the
DIF language can be found in [Hsu et al., 2007].
The topology block of the specification specifies the
graph topology, which includes all of the nodes and
edges in the graph. DIF supports built-in attributes
such as interface, refinement, parameter, and
actor, which identify specifications related to graph
interfaces, hierarchical subsystems, dataflow parame-
ters, and actor configurations, respectively. DIF also
allows user-defined attributes, which have a similar
syntax as built-in attributes except that they need
to be declared with the attribute keyword.
The DIF language has been recently augmented
with constructs for supporting topological pat-
terns [Sane et al., 2010]. Topological patterns allow
concise specification of functional structures at the
dataflow graph (inter-actor) level. They can effec-
tively represent many of the flowgraph substructures
that are pervasive in the DSP application domain
(e.g. chain, ring, butterfly, etc.) to generate com-
pact, scalable application representations. We direct
readers to [Sane et al., 2010, 2011] for more informa-
tion on the concept of topological patterns and how
the DIF supports it.
To facilitate use of the DIF language, the DIF
package (TDP) has been built (see Fig. 8). Along
with the ability to transform DIF descriptions into
manipulable internal representations, TDP contains
graph utilities, optimization engines, verification
techniques, a comprehensive functional simulation
framework, and a software synthesis framework for
generating C code [Hsu et al., 2005; Plishker et al.,
2008]. These facilities make TDP an effective en-
vironment for modeling dataflow applications, pro-
viding interoperability with other design environ-
ments, and developing and experimenting with new
tools and dataflow techniques. Beyond these fea-
tures, DIF is also suitable as a design environment
for implementing dataflow-based application repre-
sentations. Describing an application graph is done
by listing nodes (actors) and edges, and then anno-
tating dataflow specific information as well as other
(non-dataflow) kinds of relevant information associ-
ated with actors, edges, and design subsystems.
The framework in DIF for simulation and func-
tional verification of applications, which is based on
CFDF semantics, allows application specifications in
DIF to be used as executable references for rapid
system prototyping and developing further platform-
specific implementations. CFDF, which supports dy-
namic dataflow behaviors, allows flexible and efficient
prototyping of dataflow-based application represen-
tations, and permits natural description of both dy-
namic and static dataflow actors. More information
on CFDF semantics can be found in [Plishker et al.,
2008].
3.3. Related Work
There exist high-end reusable, modular, scal-
able, and reconfigurable FPGA platforms such as
the Berkeley Emulation Engine 2 (BEE2) [Chang
et al., 2005], IBOB [Parsons et al., 2006], and Uni-
Board [Szomoru, 2011], which have been introduced
specifically for DSP systems. These have been widely
used for radio astronomy applications. The BEE2
uses SDF as a unified computation model for both
the microprocessor and the reconfigurable fabric. It
SANE ET AL.: DATAFLOW MODELS FOR RADIO ASTRONOMY DSP 7
uses a high-level block diagram design environment
based on The Mathworks’ Simulink and the Xilinx
System Generator (XSG). This design environment,
however, does not expose the underlying dataflow
model. In particular, the designer has little or no
scope to make use of the underlying dataflow model
for experimentation (as mentioned earlier in Sec-
tion 1). Also, the SDF model used for program-
ming the BEE2 is a static dataflow model in that all
the dataflow information is available at compile-time
(i.e., before executing or running the application).
Though this feature provides maximal compile-time
predictability, it has limited expressive power. It
does not allow for data-dependent, dynamic behav-
ior, which is exhibited by many modern DSP ap-
plications, such as the TDD application introduced
in Section 2 (see [Bhattacharyya et al., 2010] for
more examples of such applications). Other forms
of dataflow models that can capture more applica-
tion dynamics with acceptable levels of compile-time
predictability may better exploit the features offered
by platforms such as the BEE2. We should, how-
ever, mention that the CASPER DSP library offers
a software register block that can provide limited pa-
rameterization in the design. We have used this block
extensively in our TDD design.
There are some other FPGA design solutions
and tool flows available (e.g., those from Nallat-
ech [Nallatech website], and Lyrtech [Lyrtech web-
site]). These, however, are commercial tools and do
not provide open-source DSP software libraries like
the CASPER. Also, CASPER tools support most of
the Xilinx FPGA devices unlike these other commer-
cial tools.
Model based approaches for designing large scale
signal processing systems with a focus on radio tele-
scopes have been previously studied (e.g., see [Alliot
and Deprettere, 2004; Lemaitre and Deprettere, 2006;
Lemaitre, 2008]). Several frameworks have been pro-
posed for model based, high-level abstractions of ar-
chitectures along with performance/cost estimation
methods to guide the designer throughout the de-
velopment cycle (see [Alliot and Deprettere, 2004]).
However, the focus of these approaches has been on
architecture exploration. There have also been at-
tempts to derive implementation-level specifications
starting from system-level specifications by segregat-
ing signal processing and control flow (see [Lee and
Seshia, 2011] for more information on control flow)
into an application specification and architecture
specification, respectively (see [Lemaitre and De-
prettere, 2006; Lemaitre, 2008]). However, the choice
of models of computation has been made primarily
from control flow considerations rather than dataflow
considerations. These approaches, though relevant,
do not specifically address the issue of high-level ap-
plication specification for platform-independent pro-
totyping and use of models of computation for ab-
straction of heterogeneous or hybrid dataflow behav-
iors. This issue is critical to efficient prototyping
of high performance signal processing applications,
which are typically dataflow dominated, and include
increasing levels of dynamic dataflow behavior (e.g.,
see [Bhattacharyya et al., 2010]).
We address this issue using the CFDF model with
underlying PSDF or PCSDF behavior and using it
for system prototyping. We then show how platform-
independent specifications based on this modeling
technique can be used to efficiently develop platform-
specific implementations.
4. Dataflow-based Design and Implementation
of a TDD
We propose an approach for design and implemen-
tation of a TDD based on the dataflow formalisms
discussed in Section 3.1 along with relevant capabil-
ities of the DIF tool described in Section 3.2. Fig. 9
gives an overview of our dataflow based approach,
which we now describe.
4.1. Modeling and Prototyping using DIF
We start with an application specification that de-
scribes the DSP algorithm under consideration (in
this case, the TDD) along with proper input and
output interfaces. The application is specified using
the DIF language. This DIF specification consists of
topological information about the dataflow graph —
interconnections between the actors along with in-
put and output interfaces. The DIF specification is
a platform-independent, high-level application spec-
ification. The specification can be used, for example,
to simulate the application, given the library of ac-
tors from which the specification is constructed.
Depending upon the application under consider-
ation, the designer can select among a variety of
dataflow models of computation in DIF to effectively
capture relevant aspects of the application dynamics.
It should be noted that the designer does not always
need to specify the model in advance. The CFDF
model can be used to describe individual modules
8 SANE ET AL.: DATAFLOW MODELS FOR RADIO ASTRONOMY DSP
(actors) in the application, and the DIF package can
analyze the CFDF representation (CFDF modes, to
be specific) of the actors, as specified by the designer
through the actor code, and annotate the actors with
additional dataflow information using various tech-
niques for identifying specialized forms of dataflow
behavior (e.g., see [Plishker et al., 2010]). This step
requires the functionality of individual actors to be
specified in CFDF semantics. The designer can use
the existing blocks from the Java actor library in DIF
or develop his or her own library of CFDF actors.
In terms of tunability, the key components of the
TDD as seen from Fig. 3 are the tunable FIR fil-
ter, and decimation filter blocks. The tunable dec-
imation filter (TDF) block is of particular interest,
considering that it is the only multirate block in the
system. Its behavior resembles that of the one de-
scribed in Section 3.1.3. In view of this, we have
identified PSDF and PCSDF as candidate dataflow
models for efficient implementation of the targeted
TDD system. For this system, we have to take into
account the multiple inputs and outputs to actors,
as mentioned in Section 2.
To illustrate details of the dataflow behavior of
a decimator actor based on such specifications, we
have shown one such decimator actor with 4 inputs
and outputs, and having a decimation factor of 6 in
Fig. 10(a) and Fig. 10(b). The decimator simul-
taneously receives 4 consecutive samples from its 4
inputs. It outputs every sixth input sample starting
with the first input sample. Each of these output
samples appears on a successive output of the deci-
mator.
For the sake of simplicity and clarity, we have ex-
cluded the other single rate blocks from the applica-
tion graphs in these figures. In our implementation,
we extend this behavior for an actor with 8 inputs
and outputs. We have created a DIF prototype using
PSDF and PCSDF as underlying models for equiva-
lent CFDF representation of actor blocks. We have
also developed a Java library of actors in DIF adher-
ing to CFDF semantics for all of the blocks.
We then used DIF for software prototyping, anal-
ysis, and functional simulation. The DIF package
uses the DIF specification to generate an intermedi-
ate graph representation, which can then be used as
an input for further graph transformations includ-
ing a scheduling transformation, which determines
the schedule for an application. Here, by a sched-
ule, we mean the assignment of actors to process-
ing resources, and the execution ordering of actors
that share the same resource. The functional simula-
tion capabilities provided in DIF can be used to ana-
lyze and estimate buffer requirements in terms of the
numbers of tokens accumulated on the buffers that
correspond to dataflow graph edges. This provides
an estimate of total memory requirements as well as
specifications for individual buffers when porting the
application to the targeted implementation platform.
Fig. 11 shows the TDD application graph gen-
erated using DIF. This is based on the TDD block
diagram shown in Fig. 3 with addition of some ac-
tors that handle parameter configuration for the ac-
tors. We discard one of the two sets of outputs (more
specifically, sine output) of the localOsc actor as we
have employed a real mixer in our design. The com-
plexity of the graph, which is increased due to mul-
tiple parallel edges between two actors, can easily be
captured through a DIF specification that makes use
of topological patterns. We have shown one of the
possible specifications of the graph topology in DIF
using topological patterns in Fig. 12.
For our design, we have used parameterized looped
schedules (PLSs) [Ko et al., 2007] for PSDF and
PCSDF models to determine the total buffer require-
ments. Using the TDD specification, we construct
PLSs for the TDD application. Fig. 13(a) shows a
PLS for a TDD application, where the decimator ac-
tor has the underlying SDF model, while Fig. 13(b)
shows one in which the decimator actor employs the
CSDF model. We have used the generalized schedule
tree (GST) representation for the PLSs [Ko et al.,
2007]. An internal node of a GST denotes a loop
count, while a leaf node represents an actor. The ex-
ecution of a schedule involves traversing the GST in
a depth-first manner, and during this traversal, the
sub-schedule rooted at any internal node is executed
as many times as specified by the loop count of that
node. As annotated in these GSTs, loop counts p0,
p1, and p2 are parameterizable. The loop count p0
is set to a user-specified number of iterations, while
the loop counts p1 and p2 are tuned based upon the
decimation factor as well as the underlying dataflow
model for the decimator. Fig. 13(a) and (b), in
particular, show values of the parameterizable loop
counts set for a decimator with a decimation factor
of 11. This PLS can be viewed as providing CFDF-
based execution for the given PDF-based actor spec-
ification model.
Table 1 shows the total buffer requirements us-
ing PLSs shown in Fig. 13(a) and (b) for various
SANE ET AL.: DATAFLOW MODELS FOR RADIO ASTRONOMY DSP 9
configurations of decimation factors. Note that for
a given configuration (setting of graph parameters),
a PSDF or PCSDF graph behaves like an SDF or
CSDF graph, respectively. It can be seen that for
the SDF model, the total buffer requirements vary
with the decimation factor, and this is due to in-
put buffers to the TDD block that need to accumu-
late varying numbers of tokens. Thus, employing the
PSDF model will require tuning buffer sizes for dif-
ferent decimation factors if one wants to provide for
optimized buffer sizes in terms of graph parameters.
We have used the CASPER tool flow for devel-
oping our platform-specific implementation as ex-
plained later in Section 4.2. This implementation
is targeted to an FPGA. Our objective here is to
support tuning the decimation factor without regen-
erating hardware code. A dataflow buffer can be im-
plemented using a FIFO or dual-port random access
memory (RAM) block in the targeted FPGA device.
The size of the available FIFO block can be set to
2n, where n ≥ 1. This gives limited control over set-
ting the FIFO size, and may increase the resource
utilization. At the same time, tuning the sizes of
FIFO or dual-port RAM blocks is not possible dur-
ing run-time. It is in general possible to set the size
of a FIFO or dual-port RAM block to a maximum re-
quired value, and access only a part of it using a tun-
able address counter during run-time. This, however,
again may lead to unnecessary increased resource uti-
lization. The ADC output is of a streaming nature
(data is produced or consumed at every clock cycle
without any synchronization signal), as is the DSP
subsystem downstream of the TDD.
In order to achieve the throughput constraint im-
posed by the maximum data rate of the ADC out-
put stream, SDF buffers need to be pipelined, which
is not efficient using RAM blocks. Thus, we use
the CSDF model, which does not require tuning
of dataflow buffer sizes to achieve the maximum
throughput constraint, as observed from our DIF-
based prototype. The TDD generates a synchroniza-
tion or enable signal indicating a valid output data.
This can be used as a clock to drive the downstream
DSP system.
We use our DIF prototype as a reference while in-
tegrating the design with the current CASPER tool
flow for the target implementation on the IBOB. Sec-
tion 4.2 further elaborates on this approach along
with implementation results.
4.2. Integration with the CASPER Tool Flow
The CASPER tool flow is based on the BEE XPS
tool flow [Parsons et al., 2006]. This tool flow re-
quires that an application be specified as a Simulink
model using XSG [Parsons et al., 2006]. Since
there is no automated tool for transforming a DIF
representation into an equivalent Simulink model,
porting the DIF specification to Simulink/XSG re-
quires manual transcoding of the DIF specification.
This also requires implementing parameterizable ac-
tor blocks that are currently not available in the
XSG, CASPER, or BEE XPS libraries.
Each actor gets transformed into an equivalent
functional XSG block. For each of the Simulink actor
blocks, we provide a pre-synthesis parameterization
that allows changing block parameters before hard-
ware synthesis (see [Parsons et al., 2007] for more
details on Simulink scripting). In order to implement
our objective of tunability — post-synthesis param-
eterization — we use the software register mecha-
nism in the BEE XPS library to specify parameters
that change during run-time (that is, after hardware
code is generated, and depending upon user require-
ments.)
Software registers can be accessed and set during
run-time from the TinyShell interface available for
IBOB. This allows tuning TDD parameters without
re-synthesizing the hardware each time the parame-
ters change from the previous setting. Each block has
an enable input signal. Through systematic trans-
formations, an application graph in DIF can be con-
verted into an equivalent Simulink/XSG model. We
have developed an interface software package using
C programs, and Bash and Python scripts to com-
pute software register values for the required TDD
configuration, and set these values on the IBOB over
a telnet connection, which is used for remote access
to the hardware platform at NRAO.
On the targeted FPGA device, we have employed
the NCO using dual-port RAM blocks that are
loaded with pre-computed sinusoidal signal values of
the required precision. Each of these dual-port RAM
blocks is used to simultaneously read sine and co-
sine values from both of its ports. The oscillator fre-
quency is set using a software register, and depends
upon the desired output signal band.
In our current implementation, the TDF block
(see Fig. 3) can have up to 16 filter taps. We
have also implemented a tunable FIR filter block,
which does not decimate, shown in Fig. 3. This
10 SANE ET AL.: DATAFLOW MODELS FOR RADIO ASTRONOMY DSP
block can have up to 8 taps in our implementa-
tion. These, again, are set using software registers.
Fig. 4(b) shows the schematic of a TDF. As shown
in this figure, we have employed two filter banks (16-
tap units) inside our design of a TDF block that
operate in tandem to allow maximum throughput
(that is, the maximum data rate of the ADC out-
put stream). Hence, our TDF block has 32 multi-
plication operations. As mentioned earlier, our TDF
design employs a polyphase implementation as de-
scribed in [Vaidyanathan, 1990]. The software com-
putes the sequence in which the input signals should
be routed to an appropriate filter tap for a given dec-
imation factor. This information is then fed to the
signal routing scheme using software registers.
Table 2 shows results for the TDD implementa-
tion on the IBOB using the Xilinx EDK 7.1.2. We
have used this hardware platform and tool for all of
the experiments reported in the remainder of the pa-
per. Design 1 shows some of the device utilization
parameters for a TDD that supports only baseband
modes. This design does not include the tunable
FIR filter, NCO, and mixer blocks shown in Fig. 3.
Design 2 is based on the block diagram of a TDD
shown in Fig. 3. As evaluation metrics for hardware
cost, we have used the utilization of FPGA slices, 4-
input look-up tables (LUTs), and block RAM units,
and the number of embedded multipliers. Note that
neither of these two designs use any of the available
embedded multipliers for multiplication. Designs 3
and 4 are modified versions of designs 1 and 2, re-
spectively, in that they employ embedded 18 × 18
multipliers. It can be seen that using embedded mul-
tipliers does not provide significant improvements in
hardware cost. We observe that use of embedded
multipliers, in fact, needs to be accompanied by ad-
dition of extra latency in the design to achieve tim-
ing closure. We have been able to achieve maximum
throughput using an implementation based on the
PCSDF model.
4.3. Platform-specific Analysis using DIF
It is common to go back and forth between a high-
level prototype and a corresponding platform-specific
implementation while designing an embedded DSP
system. Such alternation in design phases is com-
mon, for example, when one is developing a platform-
specific library or tool flow. In support of such a de-
sign methodology, it is desirable for a high-level de-
sign tool to support platform-specific analysis. This
can be achieved by annotating the high-level appli-
cation specification with platform-specific implemen-
tation parameters, which are derived through device
data sheets, experimentation or some combination of
both.
DIF supports specifying user-defined actor param-
eters. We use this feature in DIF to annotate ac-
tors with two relevant implementation parameters
— the latency constraint, and number of embedded
multipliers. This allows estimating results based on
the DIF prototype itself instead of determining them
from the constructed design, which is generally time
consuming. We have verified the accuracy of metrics
estimated by our DIF model compared with actual
hardware synthesis results that are shown in Table 2.
Developers of tool flows and DSP libraries can pro-
file their library blocks to determine a wide variety of
platform-specific implementation parameters. DIF
can use such information to estimate implementa-
tion parameters at a high-level of abstraction, and
earlier in the design cycle to help efficiently prune
segments of the design space. Support for estima-
tion of various platform-specific resources for differ-
ent platforms is beyond the scope of this paper. It
is, however, an important direction toward develop-
ing alternative model based design flows and open
access tool flows for astronomical DSP solutions.
4.4. Exploring Implementation Trade-offs between
TDD and FDD Designs
One of the motivations for the work presented in
this paper has been to develop library blocks needed
for a TDD using Xilinx LogicCore and CASPER li-
brary blocks. The current CASPER DSP library
provides a decimator (see Fig. 4(a)) that supports
decimation factors that are powers of 2. The decima-
tion factor as well as the filter coefficients of the FIR
filter are not tunable after the hardware code is gen-
erated. Our design provides flexibility with not only
the decimation factor but also the filter coefficients
through the use of software registers, as explained
earlier. The FDD designs, though not tunable, have
lower hardware cost in terms of device utilization.
Table 3 provides a summary of some of the hardware
utilization parameters for the FDD designs. These
designs have also been implemented on a CASPER
IBOB. The decimation factor of 10 has been achieved
by first interpolating the input by a factor of 80, and
then decimating it by a factor of 8. Comparison be-
tween the results in this table and those in Table 2
SANE ET AL.: DATAFLOW MODELS FOR RADIO ASTRONOMY DSP 11
clearly highlights the trade-off between design flex-
ibility and hardware cost. Using the model-based
approach presented in this paper, the designer can
effectively explore this trade-off based on the given
design requirements.
4.5. TDD and FDD for Multistage Downconversion
Though our TDD design supports limited decima-
tion factors (integer factors between 5 and 12), its
usage is not limited to these factors. It can be read-
ily scaled and applied to achieve other decimation
factors by cascading multiple TDF blocks. Fig. 14
shows some of the possible input/output sampling
rate relations that can be achieved by such use of
cascaded TDF blocks. Design 1 in Table 4 employs
cascaded TDF blocks, while design 2 in Table 4 em-
ploys cascaded fixed-configuration decimation filter
(FDF) blocks. Both of these designs have been de-
veloped to demonstrate multistage downconversion
for a baseband signal and hence, do not employ mix-
ers. It is possible to extend these designs to include a
mixer to allow all possible narrow band outputs and
not just the baseband output. For all of the designs
in this table that use one or more TDF blocks, the
TDF block employs dedicated embedded multipliers.
In this light, we further explore the trade-off be-
tween the low hardware cost of FDD designs and flex-
ibility offered by TDD designs by examining a design
consisting of an FDF block followed by a TDF block
(designs 3 and 4 in Table 4). These designs provide
limited tunable decimation factors compared to de-
sign 1, but also have lower hardware cost in terms of
device utilization.
5. Summary and Conclusions
We have proposed a dataflow-based approach for
prototyping radio astronomy DSP systems. We have
used a dataflow-based high-level application model
that provides a platform-independent specification,
and assistance in functional verification and impor-
tant resource estimation tasks. This can prove ef-
fective in reducing the development cycle and faster
deployment of DSP systems across various target
platforms. We have employed this approach to me-
thodically develop a TDD based DSP backend de-
sign. Our TDD implementation is targeted to the
CASPER FPGA board, called IBOB, and supports
tuning narrow band modes without the need for re-
generating hardware code. We have also explored
the trade-off between the low hardware cost for FDD
designs and the flexibility offered by TDD designs.
This trade-off has also been highlighted in the con-
text of designs employing a two-stage downconver-
sion scheme. A designer can explore this design space
to best meet the application requirements. Expand-
ing on our work to integrate TDDs with ongoing de-
velopment of spectrometer designs at the NRAO on
the latest CASPER hardware is a natural extension
of the work presented in this paper.
There is a growing interest in the radio astronomy
community to have open-access and portable astro-
nomical signal processing solutions. Currently, this is
constrained by proprietary commercial tools targeted
for specific platforms. We have also relied on these
tools, mainly for hardware synthesis and code gener-
ation, in our work. In this context, it is of interest
to have high-level application description languages
with semantic foundations in models of computation,
and the corresponding design tools for efficient speci-
fication, simulation, functional verification, and syn-
thesis. Developing model based, platform-specific li-
braries, and devising techniques for automatic code
generation from high-level representations, such as
those in DIF, specifically for the radio astronomy do-
main is an important direction for future research.
Acknowledgments. This research was sponsored in
part by the National Radio Astronomy Observatory, Aus-
trian Marshall Plan Foundation, and National Science
Foundation (grant AGS-0959761 to New Jersey Institute
of Technology). We acknowledge with thanks the contri-
butions of Shilpa Bollineni, Srikanth Bussa, Randy Mc-
Cullough, Scott Ransom, and Jason Ray of the National
Radio Astronomy Observatory. The National Radio As-
tronomy Observatory is a facility of the National Science
Foundation operated under cooperative agreement by As-
sociated Universities, Inc.
12 SANE ET AL.: DATAFLOW MODELS FOR RADIO ASTRONOMY DSP
SANE ET AL.: DATAFLOW MODELS FOR RADIO ASTRONOMY DSP 13
References
Alliot, S., and E. Deprettere (2004), Architecture ex-
ploration of a large scale system, in Proceedings of
the IEEE International Workshop on Rapid System
Prototyping, pp. 217–224, Geneva, Switzerland, doi:
10.1109/IWRSP.2004.1311120.
Bhattacharya, B., and S. S. Bhattacharyya (2000), Pa-
rameterized dataflow modeling of DSP systems, in
Proceedings of the International Conference on Acous-
tics, Speech, and Signal Processing, pp. 1948–1951, Is-
tanbul, Turkey.
Bhattacharyya, S. S., R. Leupers, and P. Marwedel
(2000), Software synthesis and code generation for
DSP, IEEE Transactions on Circuits and Systems —
II: Analog and Digital Signal Processing, 47 (9), 849–
875.
Bhattacharyya, S. S., E. Deprettere, R. Leupers, and
J. Takala (Eds.) (2010), Handbook of Signal Processing
Systems, Springer.
Bilsen, G., M. Engels, R. Lauwereins, and J. A. Peper-
straete (1996), Cyclo-static dataflow, IEEE Transac-
tions on Signal Processing, 44 (2), 397–408.
Buck, J. T. (1993), Scheduling dynamic dataflow graphs
with bounded memory using the token flow model,
Ph.D. thesis, EECS Department, University of Cali-
fornia, Berkeley.
CASPER Website (), Collaboration for astron-
omy signal processing and electronics research,
http://casper.berkeley.edu.
Chang, C., J. Wawrzynek, and R. W. Brodersen (2005),
BEE2: a high-end reconfigurable computing system,
Design & Test of Computers, IEEE, 22 (2), 114–125,
doi:10.1109/MDT.2005.30.
Eker, J., and J. W. Janneck (2003), CAL language report,
language version 1.0 — document edition 1, Tech. Rep.
UCB/ERL M03/48, Electronics Research Laboratory,
University of California at Berkeley.
Ford, J., and J. Ray (2010), An application of high-
performance reconfigurable computing in radio as-
tronomy signal processing, in High-Performance Re-
configurable Computing Technology and Applications
(HPRCTA), Fourth International Workshop on, pp.
1–7, doi:10.1109/HPRCTA.2010.5670794.
Haubelt, C., J. Falk, J. Keinert, T. Schlichter,
M. Streubhr, A. Deyhle, A. Hadert, and J. Teich
(2007), A SystemC-based design methodology for dig-
ital signal processing systems, EURASIP Journal on
Embedded Systems, 2007, Article ID 47,580, 22 pages.
Hsu, C., M. Ko, and S. S. Bhattacharyya (2005), Soft-
ware synthesis from the dataflow interchange format,
in Proceedings of the International Workshop on Soft-
ware and Compilers for Embedded Systems, pp. 37–49,
Dallas, Texas.
Hsu, C., I. Corretjer, M. Ko., W. Plishker, and S. S.
Bhattacharyya (2007), Dataflow interchange format:
Language reference for DIF language version 1.0,
user’s guide for DIF package version 1.0, Tech. Rep.
UMIACS-TR-2007-32, Institute for Advanced Com-
puter Studies, University of Maryland at College Park,
also Computer Science Technical Report CS-TR-4871.
Johnson, G. (1997), LabVIEW Graphical Programming:
Practical Applications in Instrumentation and Con-
trol, McGraw-Hill School Education Group.
Ko, M., C. Zissulescu, S. Puthenpurayil, S. S. Bhat-
tacharyya, B. Kienhuis, and E. Deprettere (2007), Pa-
rameterized looped schedules for compact represen-
tation of execution sequences in DSP hardware and
software implementation, IEEE Transactions on Sig-
nal Processing, 55 (6), 3126–3138.
Kwon, S., H. Jung, and S. Ha (2004), H.264 decoder al-
gorithm specification and simulation in simulink and
PeaCE, in Proceedings of the International SoC Design
Conference, pp. 9–12.
Lee, E. A., and D. G. Messerschmitt (1987), Static
scheduling of synchronous dataflow programs for digi-
tal signal processing, IEEE Transactions on Comput-
ers, C-36 (1), 24–35, doi:10.1109/TC.1987.5009446.
Lee, E. A., and S. A. Seshia (2011), Introduction to Em-
bedded Systems, A Cyber-Physical Systems Approach,
http://LeeSeshia.org.
Lemaitre, J. (2008), Model-based specification and de-
sign of large-scale embedded signal processing systems,
Ph.D. thesis, Leiden University, The Netherlands.
Lemaitre, J., and E. Deprettere (2006), FPGA implemen-
tation of a prototype hierarchical control network for
Large-Scale signal processing applications, in Proceed-
ings of the International Euro-Par Conference, Lec-
ture Notes in Computer Science 4128, pp. 1192–1203,
Springer, Dresden, Germany.
Lyrtech website (), Lyrtech, http://www.lyrtech.com.
McAllister, J., R. Woods, R. Walke, and D. Reilly (2004),
Synthesis and high level optimisation of multidimen-
sional dataflow actor networks on FPGA, in Proceed-
ings of the IEEE Workshop on Signal Processing Sys-
tems.
Murthy, P. K., and E. A. Lee (2002), Multidimensional
synchronous dataflow, IEEE Transactions on Signal
Processing, 50 (8), 2064–2079.
Nallatech website (), Nallatech,
http://www.nallatech.com.
Parks, T. M., J. L. Pino, and E. A. Lee (1995), A
comparison of synchronous and cyclo-static dataflow,
in Proceedings of the IEEE Asilomar Confer-
ence on Signals, Systems, and Computers, vol. 1,
pp. 204–210 vol.1, Pacific Grove, California, doi:
10.1109/ACSSC.1995.540541.
Parsons, A., et al. (2005), A new approach to radio as-
tronomy signal processing, in Proceedings of the Gen-
eral Assembly of the International Union of Radio Sci-
ence.
Parsons, A., et al. (2006), PetaOp/Second FPGA signal
processing for SETI and radio astronomy, in Proceed-
ings of the IEEE Asilomar Conference on Signals, Sys-
tems, and Computers, pp. 2031–2035, Pacific Grove,
California, doi:10.1109/ACSSC.2006.355123, invited
paper.
Parsons, A., D. Chapman, and H. Chen (2007), Xil-
inx system generator for DSP in the CASPER group,
Tech. Rep. CASPER Memo 11, Center for Astronomy
Signal Processing and Electronic Research, University
of California, Berkeley.
Pino, J. L., S. Ha, E. A. Lee, and J. T. Buck (1995),
Software synthesis for DSP using Ptolemy, Journal of
VLSI Signal Processing, 9 (1).
14 SANE ET AL.: DATAFLOW MODELS FOR RADIO ASTRONOMY DSP
Plishker, W., N. Sane, M. Kiemb, K. Anand, and S. S.
Bhattacharyya (2008), Functional DIF for rapid proto-
typing, in Proceedings of the International Symposium
on Rapid System Prototyping, pp. 17–23, Monterey,
California.
Plishker, W., et al. (2010), Model-based DSP imple-
mentation on FPGAs, in Proceedings of the Interna-
tional Symposium on Rapid System Prototyping, Fair-
fax, Virginia, invited paper.
Saha, S., S. Puthenpurayil, and S. S. Bhattacharyya
(2006), Dataflow transformations in high-level DSP
system design, in Proceedings of the International
Symposium on System-on-Chip, pp. 131–136, Tam-
pere, Finland, invited paper.
Sane, N., H. Kee, G. Seetharaman, and S. S. Bhat-
tacharyya (2010), Scalable representation of dataflow
graph structures using topological patterns, in Pro-
ceedings of the IEEE Workshop on Signal Processing
Systems, San Francisco Bay Area, USA.
Sane, N., H. Kee, G. Seetharaman, and S. Bhattacharyya
(2011), Topological patterns for scalable representa-
tion and analysis of dataflow graphs, Journal of Signal
Processing Systems, 65, 229–244, 10.1007/s11265-011-
0610-1.
Shen, C., and S. S. Bhattacharyya (2009), System-
level clustering and timing analysis for GALS-based
dataflow architectures, in Proceedings of the ACM In-
ternational Workshop on Timing Issues in the Spec-
ification and Synthesis of Digital Systems, Austin,
Texas.
Stefanov, T., C. Zissulescu, A. Turjan, B. Kienhuis, and
E. Deprettere (2004), System design using Kahn pro-
cess networks: the Compaan/Laura approach, in Pro-
ceedings of the Design, Automation and Test in Eu-
rope Conference and Exhibition, vol. 1, pp. 340–345,
doi:10.1109/DATE.2004.1268870.
Suhaib, S., D. Mathaikutty, and S. Shukla (2008),
Dataflow architectures for GALS, Electronic Notes
in Theoretical Computer Science, 200, 33–50, doi:
10.1016/j.entcs.2008.02.005.
Szomoru, A. (2011), The UniBoard: A multi-purpose
scalable high-performance computing platform for
radio-astronomical applications, in General Assembly
and Scientific Symposium, 2011 XXXth URSI, pp. 1–
4, doi:10.1109/URSIGASS.2011.6051281.
Thies, W., M. Karczmarek, and S. Amarasinghe (2002),
StreamIt: A language for streaming applications, in
International Conference on Compiler Construction,
Grenoble, France.
Vaidyanathan, P. (1990), Multirate digital filters, filter
banks, polyphase networks, and applications: a tu-
torial, Proceedings of the IEEE, 78 (1), 56–93, doi:
10.1109/5.52200.
S. S. Bhattacharyya, Department of Electrical and
Computer Engineering, and Institute for Advanced Com-
puter Studies, University of Maryland, College Park,
MD, 20742, USA. (ssb@umd.edu)
J. Ford, National Radio Astronomy Observatory,
Green Bank, WV, 24944, USA. (jford@nrao.edu)
A. I. Harris, Department of Astronomy, University
of Maryland, College Park, MD, 20742, USA. (har-
ris@astro.umd.edu)
N. Sane, Department of Physics, and Center for Solar-
Terrestrial Research, New Jersey Institute of Technology,
Newark, NJ, 07102, USA. (nimish.sane@njit.edu)
(Received .)
SANE ET AL.: DATAFLOW MODELS FOR RADIO ASTRONOMY DSP 15
P M[1 1 1 1]             
1 R
     1
[1 0 0 0]
(a)
P M4
1 R1
1
(b)
Figure 1. An application graph with a simple decimator
actor M using the (a) CSDF, and (b) SDF models. Actor
M is a decimator with a decimation factor of 4.
Data FIR Filter (CN)1
1 Decimator[1 1 ... 1]1 x D         
1
Output
              1
[1 0 0 ... 0]1 x D
(a)
(b)
Figure 2. Modeling a parameterized decimation filter
(DF) application using PCSDF: (a) Application graph —
CN denotes a vector of FIR filter coefficients, and D de-
notes a decimation factor, and (b) PCSDF representa-
tion.
16 SANE ET AL.: DATAFLOW MODELS FOR RADIO ASTRONOMY DSP
Figure 3. Block diagram of a tunable digital downconverter.
SANE ET AL.: DATAFLOW MODELS FOR RADIO ASTRONOMY DSP 17
(a)
(b)
Figure 4. Schematic of (a) fixed-configuration decima-
tion filter (FDF) in the CASPER library, and (b) tunable
decimation filter (TDF) that is part of a TDD. The FDF
achieves downconversion of 8 by having 8 parallel inputs
x[n], x[n − 1], . . . , x[n − 7]. Here, h0, h1, . . . , h7 denote
the filter coefficients, and y[n] denotes the output. For
TDF, 16-tap units are similar to the structure inside the
dotted box shown in (a) with tunable filter taps. The
TDF block has 8 inputs as well as 8 outputs.
18 SANE ET AL.: DATAFLOW MODELS FOR RADIO ASTRONOMY DSP
Figure 5. Two of the possible configurations of a TDD:
(a) Bw = 160 MHz, Cf = 80 MHz (b) Bw = 320 MHz,
Cf = 480 MHz. The colored area shows the extracted
frequency band.
W X1D 2
1
Y
13
Z2
2
Figure 6. An SDF graph.
Figure 7. The DIF language.
SANE ET AL.: DATAFLOW MODELS FOR RADIO ASTRONOMY DSP 19
Figure 8. The DIF Package.
Figure 9. Dataflow-based approach for design and implementation of a TDD.
20 SANE ET AL.: DATAFLOW MODELS FOR RADIO ASTRONOMY DSP
ADC Decimator
6
1 6
1 6
1 6
1
Output
1
1 1
1 1
1 1
1
(a)
ADC Decimator
[1 1 1 1 1 1]           
1 [1 1 1 1 1 1]           
1 [1 1 1 1 1 1]           
1 [1 1 1 1 1 1]           
1
Output
1
     [0 0 0 0 1 0] 1
     [0 0 0 1 0 0] 1
     [0 1 0 0 0 0] 1
     [1 0 0 0 0 0]
(b)
Figure 10. Dataflow behavior of a Decimator actor with
4 inputs and outputs for a decimation factor of 6 using
(a) SDF, and (b) CSDF models.
source
copy
bpf
Merge
multiplier
decimator
sink
control
fork_0_
fork_1_
fork_2_
localOsc
dump
Figure 11. TDD application graph generated using DIF.
SANE ET AL.: DATAFLOW MODELS FOR RADIO ASTRONOMY DSP 21
Figure 12. Partial DIF specification — topology block
— for the TDD application graph using topological pat-
terns.
22 SANE ET AL.: DATAFLOW MODELS FOR RADIO ASTRONOMY DSP
1
control fork_0_ fork_1_ bpf fork_2_ localOsc Merge decimator
1
p0 = 10
p2 = 11
source copy bpf localOsc multiplier dump Merge
p1 = 1
1 sink
decimator
(a)
1
control fork_0_ fork_1_ bpf fork_2_ localOsc Merge decimator
1
p0 = 10
p2 = 1
source copy bpf localOsc multiplier dump Merge
p1 = 11
1 sink
decimator
(b)
Figure 13. PLSs for the TDD application configured for
a decimation factor of 11, and decimator actor employing
the (a) PSDF and (b) PCSDF models of computation.
Input 
 Sample Rate 
 1600 MS/s
Decimation 
 by 2
Decimation 
 by 11
Output 
 Sample Rate 
 72.72 MS/s
Input 
 Sample Rate 
 1600 MS/s
Decimation 
 by 8
Decimation 
 by 5
Output 
 Sample Rate 
 40 MS/s
Input 
 Sample Rate 
 1600 MS/s
Decimation 
 by 10
Decimation 
 by 5
Output 
 Sample Rate 
 32 MS/s
Figure 14. Two-stage digital downconversion.
SANE ET AL.: DATAFLOW MODELS FOR RADIO ASTRONOMY DSP 23
Table 1. Total buffer requirements from a DIF prototype
for different decimation factors using parameterized looped
schedules.
Decimation Factor 5 6 7 8 9 10 11 12
Total buffer requirements SDF 132 140 148 156 164 172 180 188
(Number of tokens) CSDF 100 100 100 100 100 100 100 100
Table 2. Implementation summary for TDD designs. In
all the designs below, the input bandwidth is 800 MHz, and
decimation factor, D, is tunable such that 5 ≤ D ≤ 12.
Parameter Design 1 Design 2 Design 3 Design 4
Mixer No Yes No Yes
Latency (ns) 65 150 85 190
FPGA slices (Out of 23616) 12234 (52%) 13315 (56%) 12322 (52%) 14232 (60%)
4 input LUTs (Out of 47232) 14139 (29%) 16123 (34%) 12123 (25%) 15035 (31%)
Block RAMs (Out of 232) 41 (17%) 48 (20%) 41 (17%) 48 (20%)
18× 18 Multipliers (Out of 232) — — 32 (13%) 95 (40%)
Table 3. Implementation summary for FDD designs. In all the designs below, the input bandwidth is 800 MHz.
Parameter Design 1 Design 2 Design 3 Design 4
Mixer No No Yes Yes
Decimation factor 8 10 8 10
Bw (MHz) 100 80 100 80
Cf (MHz) 50 40 400 400
Latency (ns) 35 440 50 455
FPGA slices (Out of 23616) 4175 (17%) 6142 (26%) 5690 (24%) 6439 (27%)
4 input LUTs (Out of 47232) 5153 (10%) 5216 (11%) 5984 (12%) 6003 (12%)
Block RAMs (Out of 232) 41 (17%) 41 (17%) 49 (21%) 49 (21%)
18× 18 Multipliers (Out of 232) 8 (3%) 8 (3%) 32 (13%) 32 (13%)
Table 4. Implementation summary for designs employing
two-stage downconversion using cascaded FDF or TDF blocks.
In all the designs below, the input bandwidth is 800 MHz.
None of these designs employs a mixer block.
Parameter Design 1 Design 2 Design 3 Design 4
No. of FDF blocks 0 2 1 1
No. of TDF blocks 2 0 1 1
FDF Decimation factor(s) — 8, 10 8 10
Bw (MHz)
∗∗ Tunable 10 Tunable Tunable
(≤ 800) (≤ 100) (≤ 80)
Latency (ns) 170 475 120 505
FPGA slices (Out of 23616) 17141 (72%) 5765 (24%) 11073 (46%) 12641 (53%)
4 input LUTs (Out of 47232) 19718 (41%) 5506 (11%) 12245 (25%) 12310 (26%)
Block RAMs (Out of 232) 41 (17%) 41 (17%) 41 (17%) 41 (17%)
18× 18 Multipliers (Out of 232) 64 (27%) 16 (6%) 40 (17%) 40 (17%)
∗∗
Bw, if tunable, can be tuned to frequencies consistent with decimation factors supported by the TDD block.
