Abstract-This paper presents the implementation of a video decoding application starting from its dataflow and CAL representations. Our objective is to demonstrate the ability of the Open RVC-CAL Compiler (Orcc) to generate code for embedded systems. For the demonstration, the video application will be an MPEG-4 Part2 decoder. The targeted architecture is a multi-core heterogeneous system deployed onto the Zynq platform from Xilinx.
I. INTRODUCTION
More complex communication and video standards and more powerful and heterogeneous multi-core hardware architectures, have made the design of modern electronic systems an arduous process. In order to help the designers in accomplishing their task, the use of models which abstract both the hardware and software parts has been studied. It is in this context that frameworks like the Open RVC-CAL Compiler (Orcc) [1] evolve. The primary purpose of Orcc is to generate a variety of classic programming languages (e.g. C, C with directives, VHDL) starting from actors written in Cal Actor Language (CAL) and their interconnection network. The Compa project also intends to provide a solution in this domain [2] .
Compa is an ANR (the French National Research Agency) project (from October 2011 to April 2015) and it is also supported by the "Images et Réseaux" Cluster. In this project, also an abstract model of the HW architecture is considered. Besides, a runtime engine (cf. Fig. 1 ) is being conceived to ensure multiple services, e.g. the actor executions and dynamic mapping. Concerning dynamic mapping, in [3] a taxonomy of mapping methodologies is presented. Considering this work, the methodology we intend to develop fits into the hybrid classification. It is to say, we will consider design time information to perform dynamic mapping at runtime. As a first approach, the information consists of the processors' workloads. Similar analysis has already been studied in previous works, e.g. [4] , and will be used as a reference to compare with our results.
II. THE ORCC'S COMPA-BACKEND
The purpose of the Orcc's Compa-Backend is to generate actors' C code that is well suited for embedded system architectures. The C code can be enriched with OpenHMPP directives [5] in order to target accelerators like GPU thanks to simple dedicated annotations introduced in the CAL. Fig. 2 , shows the HW architecture targeted by the CompaBackend (in the red box). The system is composed of several (heterogeneous or not) Processing Elements (PE). One PE plays the role of master and the others are slaves. Each slave PE is connected to the master, e.g. through bidirectional mailboxes, to enable master-slave communications. Data exchanges between slaves take place within special FIFOs (round buffers) which can be implemented in shared or distributed memory 1 .
A. Targeted HW architecture for the demonstration
1. In a dataflow network, actors communicate through FIFO channels. B. Targeted SW architecture for the demonstration As can be seen in Fig. 2 
A. Targeted HW architecture
The system uses the dual core ARM processor, in the Processing System region, as the master PE. As many as 16 Microblaze processors have been synthetized in the Programming Logic region, as the slave PEs. Most of the slaves have access to 64Kb of local memory through the instruction/data Local Memory Bus (LMB). Also, they have access to a 128Kb shared memory through an AXI interconnect bus.
B. Targeted SW structure
As a first implementation, we use a static simple mapping strategy. We have intentionally equaled the number of slave PEs to the number of actors. Like this, each actor's code is copied into the local memory of one slave PE. Therefore, no Global Runtime, or Compa Kernel, nor master-slave communications are required. For data exchanging, a total of 32 FIFOs 512 deep have been implemented in the shared memory. Note that the number of FIFOs is derived from the number of channels in the application's dataflow model.
C. First results
The implementation of the system using a one per one mapping strategy, allows us to validate the code generated by the Orcc's Compa-Backend. In addition, some metrics have been observed. The total execution time for the first 9 frames of the video sequence is 0.41s, i.e.
frames per second (fps).
This result is very poor compared to a modern decoder system; however we do have identified some of the reasons. First of all, in the current implementation of the backend, the computations are performed directly on the FIFOs. Therefore the traffic through the bus to the shared memory is very dense. Therefore, we expect better results with a solution where data to be used is first moved into local memory. We have also measured the computing time, i.e. the actor's execution time when data is present in the input FIFO(s) and produced values can be written into the output FIFO(s). The maximum computation time obtained is 0.21 seconds i.e. 51% of the total time (for the 9 frames). This result indicates that slaves are waiting for data almost half of the application execution time. At present time, based on the computing times previously obtained, we explore the system's behavior by varying the mappings and the number of processors.
D. Advanced implementation
We have also started the implementation of the first steps toward dynamic mapping, and thus the foundations of the Compa Global and CompaK runtime.
We have connected the instruction and data caches of the slave processors to an external memory. Then, we have placed the actors' codes and local data into this memory. The caches have been dimensioned such that almost all of the assigned actor(s) can fit in. With this architecture any slave processor will be able to execute any actor at any time. It will also allow us to experiment actor migration mechanisms.
IV. CONCLUSIONS
In this demo we present part of the Compa project. We demonstrate the use of the Orcc's Compa-Backend in a real scenario, which is a MPEG-4 Part 2 decoding application. The application model contains 17 actors and uses 32 FIFO channels. As a first implementation, we use a one to one (one actor per processor) mapping strategy. As it is work in progress, we also expect to present dynamic mapping during the demo.
