High level design space exploration of RVC codec specifications for multi-core heterogeneous platforms by Lucarz, Christophe et al.
HIGH LEVEL DESIGN SPACE EXPLORATION OF RVC CODEC
SPECIFICATIONS FOR MULTI-CORE HETEROGENEOUS PLATFORMS
Christophe Lucarz1, Ghislain Roquier1, Marco Mattavelli1
1 Ecole Polytechnique Fe´de´rale de Lausanne, CH-1015 Lausanne, Switzerland
emails:{firstname.lastname}@epfl.ch
ABSTRACT
Nowadays, the design ﬂow of complex signal processing
embedded systems starts with a speciﬁcation of the ap-
plication by means of a large and sequential program
(usually in C/C++). As we are entering in the multi-
core era, sequential programs are no longer the most
appropriate way to specify algorithms targeted to run
on several processing units. The new ISO/MPEG Re-
conﬁgurable Video Coding (RVC) standard is proposing
a new paradigm for specifying and designing complex
signal processing systems. The RVC standard enables
specifying new codecs by assembling blocks, or so called
Functional Units (FUs) from a standard Video Tool Li-
brary (VTL). Flexibility, reusability, and modularity are
the key features of RVC. This new way of specifying
algorithms clearly simpliﬁes the task of designing fu-
ture video coding applications by allowing software and
hardware reuse across multiple video coding standards.
Speciﬁcations are provided in the form of an actor and
dataﬂow-based language called CAL. Although the RVC
standard does not imply any speciﬁc implementation de-
sign ﬂow, it is an appropriate starting point for target-
ing multiple processing units platforms. This paper de-
scribes a new model-driven design ﬂow which considers
both algorithm and architecture to map RVC codec spec-
iﬁcations onto heterogeneous and multi-core systems.
1. INTRODUCTION
Designing and implementing complex digital systems as
video decoders on heterogeneous multi-core platforms is
a very diﬃcult task. It is even more diﬃcult when such
process has to result into implementation with strict re-
quirements on performances and/or resources. Nowa-
days, the starting point of any traditional design ﬂow is a
speciﬁcation of the algorithm, generally expressed in im-
perative programming (like C/C++, as used in MPEG).
This work is part of the ACTORS European Project (Adap-
tivity and Control of Resources in Embedded Systems), funded
in part by the European Unions Seventh Framework Programme.
Grant agreement no 216586
This way of specifying algorithms is essentially sequen-
tial, specifying unnecessarily orders of the operations to
perform. It also hides the inherent parallelism of the al-
gorithm that is extremely useful while we are now enter-
ing the multi-core era. Even if many designs are already
taking care to expose parallelism by using threads, it is
more and more diﬃcult to guess the behavior of the ﬁnal
application. It becomes clear that any language coming
from the sequential paradigm is not appropriate anymore
to specify complex signal processing algorithms. A shift
towards another paradigm more apt to face the paral-
lelism challenges is clearly necessary.
The CAL Language [1] is a recently speciﬁed dataﬂow
and actor-based language capable of concisely expressing
complex signal processing algorithms. Such language has
very interesting features for describing parallel applica-
tions [2]. A subset of the original CAL language has been
standardized in the new MPEG Reconﬁgurable Video
Coding framework (RVC) [3, 4] and is used to specify
the standard Video Tool Library (VTL).
This paper presents a complete design ﬂow aiming at
easing the implementation of complex signal processing
systems starting from the dataﬂow speciﬁcation used by
MPEG RVC. Being such speciﬁcation characterized by
a higher level of abstraction than the one provided by
common imperative sequential languages, it is possible
to use it when targeting both software and hardware im-
plementation worlds, thus achieving a true uniﬁed repre-
sentation at the level of the speciﬁcation. When design-
ers develop at low levels of abstractions, any modiﬁca-
tion of the design can be very resource-consuming since
it has to take into account unnecessary low-level details
of both software and hardware implementations. Being
able to design systems at a high level of abstraction by
taking into account implementation constraints (with a
correspondence to the low levels of abstractions) and by
detecting bottlenecks at the early stages of the design
process is a clear advantage of time and resources in the
design of any (complex) processing system.
Section 2 explains the implications of the use of the
CAL language in terms of implementation. Section 3 de-
scribes the steps of the design ﬂow. Section 4 describes
the supporting tools. Section 6 presents a case study,
the MPEG-4 Simple Proﬁle decoder as deﬁned within
the RVC standard library. Section 7 discusses the ad-
vantages and drawbacks of the described methodology
and outlines perspectives of future work. Section 8 con-
cludes the paper.
2. WHAT THE IMPLEMENTATION OF CAL
PROGRAMS IMPLIES?
The raise of the abstraction level, possible by the use of
CAL presents several advantages in the design of signal
processing systems in terms of design productivity, low-
level implementation details are not taken into account
and only the architecture of the dataﬂow processing is
considered at CAL level. However, the fact that the
design is done at high level is not a guarantee that it
results into an eﬃcient implementation, but requires an
appropriate approach to the CAL dataﬂow design.
Inter-actors communication channels are imple-
mented as ﬁnite size FIFO, potentially introducing
performance limitations in case of concurrent mem-
ory accesses. An actor is in general constituted by a set
of actions that may ﬁre according to the current state
of the actor, to the availability of tokens and to the sat-
isfaction of ﬁring conditions (guards). If several actors
are mapped on the same processing unit, several actions
may be ﬁreable at the same time. Thus, a scheduling
policy must be deﬁned to select the action that may be
ﬁred next.
An appropriate design willing to achieve eﬃcient
implementations starting from CAL dataﬂow programs
need to minimize the unnecessary overhead introduced
by such control mechanisms such as scheduling of actions
and FIFO accesses, implied by the dataﬂow paradigm on
which the CAL language is built. This is the objective
of the design ﬂow and the tools that implement it.
3. DESIGN FLOW
The most interesting feature of a dataﬂow based design
ﬂow is that there is a close correspondence between the
high level speciﬁcation and its actual implementation.
Token ﬂows corresponds to bandwidths, partitions of a
CAL networks correspond to mappings on architectural
components and so on. Relying on such correspondence,
a design ﬂow alternating design phases and evaluation
phases might be able to detect and correct bottlenecks
while remaining at high levels of abstractions. Figure 1
illustrates the main steps of the design ﬂow:
• The preliminary step (Sequential to Parallel trans-
formation) aims at building a first CAL pro-
gram from any speciﬁcation. It results in an
architecture-agnostic CAL program which exposes
the inherent parallelism and the data ﬂow struc-
ture of the algorithm.
• The design loop (composed of the characteriza-
tion, profiling, partitioning and scheduling steps) is
responsible for building an appropriate CAL
program according to a speciﬁc target platform
(thick red arrows).
• The evaluation loop is responsible for evaluating
the design, given a target platform (thin blue ar-
rows).
• After each evaluation, the designer has the choice
to refactor the CAL program (CAL Transforma-
tions) or stop the refactoring iterations for the ﬁ-
nal implementation. Automatic hardware and soft-
ware synthesis tools are used to generate the plat-
form speciﬁc code including the computed parti-
tioning and scheduling.
Many possible optimization criteria can be used to
drive the refactoring. However performance maximiza-
tion and resource minimization combined in diﬀerent
trade-oﬀs are the most common.
3.1. Sequential to Parallel program transforma-
tion
The ﬁrst step is to write a speciﬁcation in the form of
a CAL program. If the starting point is sequential im-
perative program, extracting metrics such as the compu-
tational load of functions, the critical path of functions,
the amount of data transfers between functions might
be valuable information for building a dataﬂow program.
During this process, the designer could be supported by
various proﬁling tools that outline where the complexity
of the algorithm is in order to design an eﬃcient ﬁrst
version of the CAL program.
3.2. Characterization
Once the ﬁrst version of the CAL dataﬂow program is
built, the designer can characterize it by running static
and dynamic (trace-based and simulation-based) ana-
lyzes to extract measures of: the nature of the actors
(SDF [5], CSDF [6] or DDF [7]), computational load
of each actor/action, dependencies between actors, data
exchanges between actions, number of executions of ac-
tions, the critical path of the execution of the program.
3.3. Profiling
Such proﬁling step consists in determining the execution
times of actions according to the considered underlying
Charaterization
Profiling 
Partitioning & 
Scheduling
Evaluation of the 
CAL program
Designer 
choice
Fair CAL
Program
Refining the 
evaluation
Change target 
platform
Metrics
Execution times
Design point
CAL
Transformations
Library of Algorithms 
and Heuristics
Final CAL 
design
Sequential to 
Parallel 
transformation
Choice of a
new algorithm
Implementation
HW/SW Code 
Generators
Specification
Fig. 1: Design Flow: interlacing of a design and an
evaluation loops.
architecture. The speciﬁcities of the target platform can
taken into account in order to obtain a good approxima-
tion of the execution time of the actions. For example, in
the ARM 7500FE processor, a General Purpose Proces-
sor (GPP), the multiply instructions take one instruction
fetch andm internal cycles,m being the number of cycles
required by the multiply algorithm, which is determined
by the contents of the registers. In Digital Signal Proces-
sors (DSP), a Multiply-Accumulate operation costs only
one clock cycle. This step is very important because the
partitioning and scheduling steps are based on the values
of the computational load of the actions.
3.4. Partitioning and scheduling
The partitioning consists of mapping a network of ac-
tors onto processing units. Scheduling consists in deter-
mining the policy for execution of actions partitioned in
the same processing units. The problem of optimally
partitioning and the scheduling CAL programs is ob-
viously a NP-hard combinatorial problems and eﬃcient
heuristics are currently being studied by several research
groups [8] [9].
3.5. Evaluation
Detecting bottlenecks and modifying the architecture of
a system at high abstraction level (i.e. CAL level) is
advantageous in terms of productivity. It is more prob-
lematic and resource consuming when done at low levels
of abstractions (e.g. in C for software and in VHDL
for hardware implementations). This is the most at-
tractive feature of CAL dataﬂow design ﬂow. However,
in order to make it practically possible, it is necessary
that the evaluation step validating the behavior of the
refactored CAL program onto the target platform is per-
formed without low level manual rewriting and results
should be suﬃciently reliable for driving the appropriate
dataﬂow architecture changes. Since there is a corre-
spondence between speciﬁc architecture-dependent im-
plementation components and elements of the CAL pro-
gram, the designer can evaluate the results of successive
refactoring and consequent architecture changes in sev-
eral iterations, thus isolating and evaluating the eﬀect of
each single change on the implementation.
3.6. CAL refactoring
The designer has essentially two refactoring possibilities
at the level of actors: splitting or merging, resulting in
increasing or decreasing the explicit level of parallelism.
Another level of refactoring is at level of actions that
may consume/generate tokens at coarser or ﬁner levels.
3.7. Code generation
The most attractive feature of CAL dataﬂow based
design is the existence of synthesis that can gener-
ate C/C++/LLVM/Java or VHDL/Verilog implemen-
tations from CAL programs. Software (C language)
and Hardware (VHDL/Verilog) code generators are de-
scribed respectively in [10] [11] and [12].
4. TOOLS
This section describes in more details the tools that sup-
port the steps of the design ﬂow described in the previ-
ous section. The tools are classiﬁed into two categories
: the characterization tools which aim at extracting all
the properties of the CAL program and the exploration
tools which aim at exploring the design space. All these
tools are integrated in an environment called CAL De-
sign Suite [13] which is open-sourced on Sourceforge and
has been developed in Java.
4.1. Characterization Tools
StatiCAL performs static analyzes of the CAL code.
It extracts information such as the structure of the pro-
gram, the size of the ports. ProfiCAL determines the
execution times of actions by means of a dynamic anal-
ysis of the CAL program given an input stimulus. It
records the computational load of the actions composing
the program and measures executions times. TraciGen
records an execution of the CAL program in a graph rep-
resentation (causation trace) in which each node of the
graph represents a ﬁring of an action and arcs represent
dependencies between action executions. TraciCAL an-
alyzes the causation trace and extracts statistics on the
execution. CrossCAL generates new metrics (e.g. data
transfers between actions, code coverage rate) by cross-
ing metrics with each others.
4.2. Exploration Tools
SchedulCAL computes partitioning and scheduling
conﬁgurations for the CAL program according to diﬀer-
ent optimization criteria. For example, load balancing,
aiming at sharing equally the computations on the dif-
ferent processors, has been implemented in the environ-
ment. Algorithms aiming at maximizing the throughput
are currently under development. EvalCAL evaluates
the resulting refactored CAL program, which has been
scheduled and partitioned. AnalytiCAL displays all
the results obtained with the characterization and ex-
ploration tools.
5. DESIGN SPACE EXPLORATION
Designing highly complex digital streaming system does
not result into a single solution. The design space rep-
resentation provides a mean to visualize graphically and
to compare diﬀerent designs according to given criteria
(e.g. performance, resources) in order to evaluate the
eﬃciency of the current design.
Axes System design is guided by constraints : perfor-
mance, resources, size, cost, etc. Designers try to
optimize the design according to the chosen cri-
teria. Thus, the design space can be represented
according to these optimization criteria. The num-
ber of criteria deﬁnes the dimension of the design
space, e.g. respectively 2D or 3D if two or three
criteria are considered and so on. Each criterion
represents an axis of the design space representa-
tion. Usually, throughput and resources are the
criteria that deﬁne the axis of the 2D design space
representation. It results in the representation il-
lustrated in ﬁgure 2 with Throughput as criteria 1
and Resources as criteria 2.
Target Region Depending on the maximization, mini-
mization or lower/upper bound conditions, speciﬁc
regions (or volumes) of interests can be deﬁned.
The target region is deﬁned according to the se-
lected optimization criteria as illustrated in ﬁgures
2 (2D design space).
Point Each point in the design space represents one
triplet: a CAL program, a Schedule, a Partition.
Obviously, it is not possible to evaluate a design if
it is not completely characterized. The partition
is a one-to-one correspondence between actors and
processing elements. The schedules represent the
ordering of actions onto each processing unit.
Maximize
Performances
Minimize 
Resources
Scheduled and 
Partitioned CAL 
Program
Target region
Requirement
on criteria 1
Criteria 1
C
rit
er
ia
 2
Requirement
on criteria 2
Fig. 2: Representation of the design space.
Starting from a given Scheduled and Partitioned CAL
Program (SPCP) located in the design space, the de-
signer has several possibilities to move towards the target
region:
1. Refactor the CAL program, and then ﬁnd again a
new partitioning and scheduling
2. Apply a new partitioning and scheduling to the
CAL program
3. Apply a new scheduling to the CAL program, keep-
ing the same partitions
Performance
Requirement
Intermediate SPCP
Initial SPCP
Target 
region
Throughput
S
of
tw
ar
e 
re
so
ur
ce
s
1
2
Final SPCP
3 1
Intermediate SPCP
Intermediate 
SPCP
Resources
Requirement
Fig. 3: Exploring the design space with diﬀerent trans-
formations.
Figure 3 presents an example of exploration of the
design space.
Starting from an initial CAL program, a CAL trans-
formation (1) leads to an intermediate CAL program
which results in a better throughput with lower resource
usage. Another scheduling (3) or partitioning (2) ap-
plied to this intermediate model lead to another location
in the design space.
6. CASE STUDY: RVC MPEG-4 SP
DECODER
The described design ﬂow has been applied for the de-
sign case of the RVC MPEG-4 Simple Proﬁle decoder.
The CAL program has been written starting from the
C/C++ reference software given by the MPEG speciﬁ-
cation. The example illustrates the design space explo-
ration aiming at achieving higher performances in terms
of throughput, while using the minimum number of pro-
cessors. The steps of the design ﬂow are described in
more details in the following paragraphs.
Characterization This ﬁrst step in the design ﬂow
aims at proﬁling the CAL network. Static and
dynamic analyzes are performed to extract met-
rics such as the computational load of each actor,
the data transfers between actors, the number of
dynamic calls of actors and actions, the nature of
the actors, the critical path of the execution.
Weighting This step consists in assigning execution
times to actions of the CAL program. The char-
acterization step provides the computational load
of actions. As a ﬁrst step, the computational load
of actions are converted into executions times on
target processors in terms of clock cycles.
Partitioning/Scheduling This step outputs the as-
signment of actors to processing units and the
scheduling of actions onto each processing unit.
The tool predicts the behavior of the whole system
given the execution time of each action in clock cy-
cles, the partitioning of actors and the scheduling
of actions. The tool estimates the performances in
terms of throughput of the system. The algorithm
used for the partitioning and the scheduling is
based on a simulated annealing approach 1. This
step is applied by considering 2, 4, 8, 16, 32 and
64 processors.
Evaluation The makespan is the metric used to evalu-
ate the performance of the design. It corresponds
to the number of clock cycles necessary for the de-
coder to decode the input bitstream. This step is
applied by considering 2, 4, 8, 16, 32 and 64 pro-
cessors. The resulting makespans before the op-
timization are reported in the table presented in
section 7.
CAL Transformations Designers have several possi-
bilities to increase the performances of a system. as
mentioned in 3.5. The designer can refactor part of
the CAL code and/or can apply diﬀerent partition-
ing / scheduling heuristic. In large CAL programs
(such as the one studied in this case study, i.e. 324
actions shared out in 63 actors), it may be diﬃcult
to determine which actions/actor refactoring pro-
vides the highest potential gains without appropri-
ate metrics and corresponding measures. For this
reason, an algorithm has been developed to detect
the most interesting actions to optimize.
The algorithm is based on the causation trace and on
the measure of the length of the critical path. A cau-
sation trace of a dataﬂow program is a directed acyclic
graph such that:
• every node is a ﬁring of an action of an actor in the
program,
• every edge from v1 to v2 is a dependency (either
through a token, state or port) from v2 on v1, im-
plying that therefore v1 has to be executed before
v2.
The critical path is the shortest weighted path from
the source node of the causation trace to its sink node,
1This algorithm has been developed by Martin Niemeier and
Andreas Karrenbauer, from the Chair of Discrete Optimization
(DISOPT, EPFL) http://disopt.epfl.ch/
(a) Before (b) After
Fig. 4: Refactoring of the DC Reconstruction network.
the weights being the execution time of actions. The crit-
ical path of the execution is provided by the CrossCAL
tool and the causation trace by TraciGen (see section 4).
The granularity of the critical path is at the action level.
Correlating the computational load of actions and their
contribution to the critical path provides the bottlenecks
(i.e. actions) of the system on which design eﬀorts (i.e
dataﬂow program refactoring) must be focused. This
algorithm applied on the RVC MPEG-4 Simple Proﬁle
decoder indicates as optimizations with the highest po-
tential of throughput improvement:
1. 30 % of action copy of the actor IAP (Inverse AC
Prediction)
2. 10 % of action ac of the actor IQ (Inverse Quanti-
zation)
3. 10 % more of action copy of the actor IAP (Inverse
AC Prediction)
4. 10 % of action read_write of actor IS (Inverse
Scan)
To implement optimizations to the listed critical ac-
tions, i.e. reducing the execution time of these actions,
the designer has several possibilities: rewrite the action
body (atomic and purely sequential) in order to opti-
mize its sequence of operations; partition the actor into
several ones in order to expose more explicit parallelism
and to reduce the critical path, i.e. sequential part of
the actor.
Weighting This step consists in assigning execution
times to actions. In order to ﬁgure out the poten-
tial improvement of the speed of the system thanks
to these optimizations, new execution times are set
to actions. These execution times are computed
according to the optimizations output by the algo-
rithm described in last paragraph: execution time
of action IAP/Copy is reduced of 40 %, IQ/ac of
10 % and IS/read_write of 10 %.
Partitioning/Scheduling The same simulated an-
nealing algorithm is used in order to compute new
partitioning and scheduling after the optimization
of the actions. This step is applied by considering
2, 4, 8, 16, 32 and 64 processors.
Evaluation This step is applied by considering 2, 4, 8,
16, 32 and 64 processors. The resulting makespans
after the optimization are reported in table pre-
sented in section 7.
6.1. Discussion of the results
This case study shows an example of design space explo-
ration, showing the main conceptual steps of the design
ﬂow, even if all the possibilities and functionality of the
tools supporting the design ﬂow are not used in this ex-
ample. Figure 5 reports the results in form of a graph in
which the percentages values represent the obtained im-
provement of the makespan after optimization for each
number of processors. Thanks to the established cor-
respondence between the CAL level and the low level,
optimizing the program at the high level in terms of
makespan will automatically result in a faster design af-
ter implementation. The advantage of working at high
level and having the guarantee that the improvements at
the high level impact directly on the implementation, is
the real gain of time during the design process.
Table 1 reports the makespans obtained before and
after optimization, considering the mapping onto 1, 2, 4,
8, 16, 32 and 64 processors respectively.
It may be argued why an optimization of more than
30 % of the makespan results only in an improvement of
20
30
40
50
60
70
N
um
be
r?o
f?p
ro
ce
ss
or
s
Before?Optim
After?Optim
0
10
0 0.2 0.4 0.6 0.8 1
N
um
be
r?o
f?p
ro
ce
ss
or
s
Throughput?(Images?/?Millions?of?clock?cycles)
Fig. 5: Exploration of the design space.
Nb of proc. Before opt. After opt. Speedup
1 28 759 336 26 570 383 7.6 %
2 14 422 202 13 301 576 7.8 %
4 7 734 369 6 957 486 10.0 %
8 4 549 236 3 838 300 15.6 %
16 3 152 493 2 489 814 21.0 %
32 2 931 854 2 194 023 25.2 %
64 2 930 104 2 032 209 30.6 %
Table 1: Comparison of results before and after opti-
mization.
15.6 % when considering eight processors. The critical
path can be considered as being the makespan of the full
execution of the program on a platform in which there are
as many processing units as actors in the program. Thus,
instead of considering 63 processing units, we consider
only four, then all the tasks that were not of the critical
path must be packed on these eight processors, resulting
in a longer makespan, and reducing the potential impact
of the optimization of the critical path.
An interesting result is that the optimization of the
critical path has a higher impact when considering the
mapping on a larger number of processors. It is coherent
with the fact that the algorithm which detects bottle-
necks is somehow considering a system constituted with
as many processors as actors. Thus, increasing the num-
ber of considered processors is making the design case
closer to the one considered in the algorithm.
6.1.1. Analysis of the bottlenecks
The algorithm for detecting bottlenecks by means of the
analysis of the critical path indicates that the actor ”In-
verse AC Prediction” (IAP) in the Y channel is the next
bottleneck point. One can see that the 8x8 blocks inside
a Y macroblock are processed sequentially by this actor.
Thus, this constitutes a serialization point that can be
parallelized.
or
Fig. 6: Principle of the AC prediction.
Figure 6 illustrates the prediction process and the
underlying dependencies between blocks. Currently, the
IAP actor processes the four blocks of the Y macroblock
sequentially. It can be parallelized such that block X0
is processed ﬁrst, then blocks {X1, X2} and ﬁnally block
X3. There are dependencies between blocks {X1,X2} and
block X0 and between X3 and {X1,X2}. Figures 7 and 8
illustrates how the IAP actor can be parallelized.
Proc. 0 X0 X1 X2 X3 X0 X1 X2 X3
Fig. 7: Current sequential processing of the IAP actor.
Proc. 1
Proc. 0 X0
X1
X2
X3
X0
X1
X2
X3
X0
X1
X2
X3
X0
X1
X2
Fig. 8: Potential parallel processing of the IAP actor.
The actor Inverse AC Prediction can be potentially
optimized by 50 % because instead of outputting four
blocks in a given amount of time, the refactored actor
outputs twice more.
7. GENERAL DISCUSSION AND FUTURE
WORK
The case study presented in this paper is a good illustra-
tion of the attractive features of this design ﬂow based
on CAL language, namely the ability to explore the de-
sign space at a high level of abstraction by testing diﬀer-
ent partitioning and scheduling policies without rewrit-
ing the whole low level implementation code. Further-
more, the rise of the level of abstraction with CAL lan-
guage enables designers to have a uniﬁed representation
for hardware and software. It allows designers to focus
more on the design of the application instead of spend-
ing time trying to deal with low-level implementation
details. Consequently, designers can concentrate more
on how to ﬁnd more eﬃcient partitions of the system,
how the platform can exploit the available parallelism
explicitly exposed in the CAL program, how to design
the application so that it allows reusability. The proof
of concept has been successfully done and the proposed
design ﬂow is promising.
Concerning future work, eﬀorts must now be focused
on raising additional implementation details to upper
levels (i.e. at CAL level), like communication costs be-
tween processing units. Unlike in imperative program-
ming (e.g. C/C++), it is straightforward to include com-
munication costs because in CAL, actors exchange data
between each others through FIFO buﬀers. The aim of
taking into account these implementation details is to
lower the level of abstraction of the evaluation and to
reﬁne it in order to get closer to the execution model.
Considering communications costs will improve undeni-
ably the evaluation process. Furthermore, there are still
other possible improvements of the evaluation process:
memory access time, scheduling overhead and others.
8. CONCLUSION
This paper presented a new model-based design ﬂow for
complex embedded systems. The originality of our work
resides in the use of a language capable of unifying the
hardware and software worlds under a unique represen-
tation, of expressing both architectural and algorithmic
concepts, of composing a design in a modular way, of par-
titioning straightforwardly complex programs, to design
heterogeneous multi-core systems. The design ﬂow has
been successfully applied to the MPEG-4 SP decoder as
described by the RVC speciﬁcation. The case study is the
proof of concept that the designer can explore the design
space without rewriting the program. This work pro-
vides a complete methodology to designers to implement
decoders deﬁned by the new ISO/MPEG RVC standard.
By raising the level of abstraction, designers can build
more eﬃciently embedded systems.
9. REFERENCES
[1] J. Eker and J. Janneck, “CAL Language Re-
port,” Tech. Rep. ERL Technical Memo UCB/ERL
M03/48, University of California at Berkeley, Dec.
2003.
[2] Christophe Lucarz, Marco Mattavelli, Matthieu
Wipliez, Ghislain Roquier, Mickael Raulet, J+rnW.
Janneck, Ian D. Miller, and Dave B. Parlour,
“Dataﬂow/Actor-Oriented language for the design
of complex signal processing systems,” in Workshop
on Design and Architectures for Signal and Image
Processing (DASIP), Bruxelles, Belgium, Nov. 2008,
pp. 168–175.
[3] Lucarz Christophe, Ihab Amer, and Marco Mat-
tavelli, “Reconﬁgurable video coding : Objectives
and technologies,” in IEEE International Confer-
ence on Image Processing, Cairo, Egypt, Nov. 2009.
[4] Shuvra S. Bhattacharyya, Johan Eker, Jorn Jan-
neck, Christophe Lucarz, Marco Mattavelli, and
M. Raulet, “Overview of the MPEG Reconﬁgurable
Video Coding Framework,” Journal of Signal Pro-
cessing Systems, 2009.
[5] Edward A. Lee and David G. Messerschmitt, “Static
scheduling of synchronous data ﬂow programs for
digital signal processing,” IEEE Trans. Comput.,
vol. 36, no. 1, pp. 24–35, 1987.
[6] Greet Bilsen, Marc Engels, Rudy Lauwereins, and
Jean Peperstraete, “Cyclo-static dataﬂow,” IEEE
transactions on signal processing, vol. 44, no. 2, pp.
397–408, 1996.
[7] Joseph T. Buck, Scheduling Dynamic Dataflow
Graphs with Bounded Memory Using the Token
Flow Model, Ph.D. thesis, EECS Department, Uni-
versity of California, Berkeley, 1993.
[8] Jani Boutellier, Christophe Lucarz, Sbastien La-
fond, Victor Gomez, and Marco Mattavelli, “Quasi-
Static scheduling of CAL actor networks for recon-
ﬁgurable video coding,” Journal of Signal Process-
ing Systems, 2009.
[9] Ruirui Gu, Jorn W. Janneck, Mickael Raulet, and
Shuvra S. Bhattacharyya, “Exploiting statically
schedulable regions in dataﬂow programs,” Acous-
tics, Speech, and Signal Processing, IEEE Interna-
tional Conference on, vol. 0, pp. 565–568, 2009.
[10] Ghislain Roquier, Matthieu Wipliez, Mickal Raulet,
Jorn W. Janneck, Ian D. Miller, and David B. Par-
lour, “Automatic software synthesis of dataﬂow
program: an MPEG-4 Simple Proﬁle decoder case
study,” in IEEE Workshop on Signal Process-
ing Systems (SiPS 2008), Washington, D.C., USA,
2008, pp. 281–286.
[11] “The Open RVC-CAL Compiler
Sourceforge open source project,”
http://sourceforge.net/projects/orcc/.
[12] J.W. Janneck, I.D. Miller, D.B. Parlour,
G. Roquier, M. Wipliez, and J.-F. Nezan, “Synthe-
sizing hardware from dataﬂow programs,” Journal
of Signal Processing Systems, 2009.
[13] “The CAL Design Suite Source-
forge open source project,”
http://sourceforge.net/projects/caldesignsuite/.
