Virtualization of heterogeneous and adaptive multi-care / multi-board systems by Oey, O. et al.
Virtualization of Heterogeneous and Adaptive Multi-Core / Multi-Board Systems 
Oliver Oey1, Stephan Werner1, Diana Göhringer12*, Andreas Stuckert2, Jürgen Becker1, Michael Hübner13+ 
1Karlsruhe Institute of Technology (KIT), Karlsruhe, Germany, 
2Fraunhofer IOSB, Ettlingen, Germany 
3Ruhr-University Bochum (RUB), Germany 
{oliver.oey, stephan.werner, diana.goehringer, becker}@kit.edu, andreas.stuckert@iosb.fraunhofer.de , michael.huebner@rub.de 
Abstract: 
This paper presents a virtualization approach for heterogeneous adaptive multi-core systems distributed onto several FPGA boards. The 
virtualization layer consists of an adapted embedded Linux kernel and several special purpose operating systems. The benefits are 
demonstrated with a complex image processing application. 
1. Overview of the system 
3.3 FSM of Accelerator 
Controlling the Accelerator 















int main ( ) { 
    unsigned int result = 0; 
    unsigned int field[10] = {2,3,4,5,6,7,8,9,10,11}; 
    unsigned int attr; 
  
    MPI_Init("demo.xml"); 
    MPI_Send(field, 10, MPI_INT, 0, 0, 0); 
    do { 
        MPI_Comm_get_attr(0,MPI_FSM_MASTER,&attr,NULL); 
    } while ( attr != 0 ); 
    attr = 0; 
    MPI_Comm_set_attr( 0, MPI_FSM_MASTER, &attr ); 
    MPI_Recv( &result, 1, MPI_INT, 0, 0, 0, 0 ); 
    MPI_Finalize(); 
 
    return 0; 
} 
3.2 MPI Program of the Linux Master 




  <ID>0</ID> 
  <algo_type>2</algo_type> 
  <filename>xml.root</filename> 
  <D_global>50</D_global> 
  <data>64</data> 
  <child> 
   <ID>1</ID> 
   <cost>4</cost> 
  </child> 
  <child> 
   <ID>2</ID> 
   <cost>4</cost> 
  </child> 
 </task> 
 <task> 
  <ID>1</ID> 
  <hw_acc>0</hw_acc> 
  <algo_type>0</algo_type> 
  <exec_time>0</exec_time> 
  <D_global>0</D_global> 
  <rcfg_time>0</rcfg_time> 
 <filename>demo_ub_s1.elf</filename> 
  <child> 
   <ID>3</ID> 
   <cost>307200</cost> 
  </child> 
  <BestN> 
   <ID>3</ID> 
   <cost>307200</cost> 














[1] D. Göhringer “Flexible Design and Dynamic Utilization of Adaptive Scalable 
Multi-Core Systems”, PhD thesis, 2011, Verlag Dr. Hut München 
[2] S. Werner, O. Oey, D. Göhringer, M. Hübner, J. Becker: „Virtualized On-
Chip Distributed Computing for Heterogeneous Reconfigurable Multi-Core 
Systems“, DATE 2012, March 2012 
[3] D. Göhringer, L. Meder, M. Hübner, J. Becker: „Adaptice Multi-Client 
Network-on-Chip Memory“. ReConFig 2011, Nov./Dec. 2011 
[4] D. G. Lowe: “Distinctive Image Features from Scale-Invariant Keypoints”. 
International Journal on Computer Vision, 60, 2, pp. 91-110, 2004 
2. From XML to DFG 
Summary: 
• SIFT Algorithm distributed onto 2 FPGA Boards 
• Heterogeneous system with processors and hardware 
accelerators 
• Execution controlled by several OSes 
• Programming via MPI 
3. Programming the System 
• Extraction of descriptors for local features in images 
• Image is segmented into 40 tiles with a size of 
64*120 pixels 
• Descriptors are overlaid into the original pictures. 
4. SIFT [3] 





1 Slave CPU 60 sec 58 sec 
4 Slave CPUs 24 sec 22 sec 
Note: The accelerator 
works with more 
accurate data that is 
why the performance is 
almost the same.  
*Diana Göhringer has been at the Fraunhofer IOSB and is now at the Karlsruhe Institute of Technology (KIT)  
*Michael Hübner has been at the Karlsruhe Institute of Technology (KIT) and is now at the Ruhr-University of Bochum 
This work received financial support by Fraunhofer IOSB, Ettlingen, Germany 
Slave Board Master Board
Ethernet
Ethernet
Ethernet
DVI
VGA
UART
= Superswitch
= Subswitch
= Slave Processor
= Master Processor
= Hardware accelerator
= Memory Controller
= Ethernet agent
