Effcient Co-Simulation of Multicore Systems by Brock-Nannestad, Laust et al.
  
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
  
General rights 
Copyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright owners 
and it is a condition of accessing publications that users recognise and abide by the legal requirements associated with these rights. 
 
• Users may download and print one copy of any publication from the public portal for the purpose of private study or research. 
• You may not further distribute the material or use it for any profit-making activity or commercial gain 
• You may freely distribute the URL identifying the publication in the public portal  
 
If you believe that this document breaches copyright please contact us providing details, and we will remove access to the work immediately 
and investigate your claim. 
   
 
Downloaded from orbit.dtu.dk on: Dec 19, 2017
Effcient Co-Simulation of Multicore Systems
Brock-Nannestad, Laust; Passas, Stavros; Karlsson, Sven
Publication date:
2011
Document Version
Publisher's PDF, also known as Version of record
Link back to DTU Orbit
Citation (APA):
Brock-Nannestad, L., Passas, S., & Karlsson, S. (2011). Effcient Co-Simulation of Multicore Systems. Poster
session presented at 4th Swedish Workshop on Multicore Computing, Linköping, Sweden.
Efficient Co-Simulation of Multicore Systems
Laust Brock-Nannestad, Stavros Passas and Sven Karlsson
Technical University of Denmark
Workflow
Motivation
I Simulation of hardware helps debugging by exposing a hardware design’s
inner operation
I Simulation speed significantly slower than the actual hardware
implementation
ICan we simulate at hardware speeds?
I by default use FPGA to simulate at high speed
I switch to a software simulator where visibility is required
Contributions
IA method for debugging and co-simulation of hardware designs at close to
real-time hardware speeds
IA mechanism for transparently capturing and moving state between FPGA
and software simulator
IPerformance evaluation and validation using two circuits
Approach
IWe capture the state of the FPGA and transfer it to our software
IThe state is injected into a software simulation model and simulation is
resumed from the point of capture
IWe perform the reverse process; transfer state from software simulation
model back into the FPGA
IThe hardware design is augmented with a small ”capture” core which
initiates a capture
IWe rely on capture and readback functionality implemented by FPGA
Conclusions
IWe implemented a method for co-simulation of hardware designs using
software simulator and FPGA
IWe validated our approach using a small circuit in a single FPGA
IOur evaluation of performance shows co-simulation is beneficial for long
simulation times
Future Work
IConsider state of components external to the FPGA – e.g. memories
IHandle all complex HDL constructs such as state encodings chosen by the
synthesis tool
IAllow for multiple FPGAs
Experimental setup
IExperiments were carried out on a Xilinx Spartan-3AN FPGA @ 50 MHz
IModelSim 6.5 was used for simulations
IWe used generic low-cost JTAG dongle for all communication with FPGA
ITwo hardware designs were implemented using VHDL
IA small design used to validate correctness
IA large design to evaluate performance
Resource
HDL Design
Validation Evaluation
Slice Flip-flops 64 381
LUTs 42 1691
Block RAMs 1 16
Table: Resource utilization for the two FPGA designs.
Results
ITo evaluate performance we compare execution time of 50,000,000 cycles
in the software simulator and on FPGA
IWe see that hardware is more than 300 times faster than simulation
IWe also measure the time from hardware description to running simulation
and from hardware description to running FPGA
IFinally, we measure the time to transfer the state between hardware and
software, when executing our workflow
Task
Execution time (seconds)
Validation Evaluation
Simulate 50,000,000 cycles 104 355
Run 50,000,000 cycles on FPGA 1 1
Generate initial bitstream 58 309
Download of bitstream 1 1
Readback and parse 33 37
Update simulation model 4 N/A
Compile in ModelSim 1 4
Regeneration of bitstream 13 33
Table: Execution times for baseline and steps in the workflow.
IGenerating initial bitstream is very slow due to synthesis and mapping
I It only has to take place once
I Subsequent modifications to bitstream are fast as we only alter initial
states of flip-flops and block memories
IFor short simulation times, the faster compile time of the software
simulator is beneficial, despite slow simulation speed
IRegular software simulation is 300 times slower than execution on the
FPGA — one second on FPGA takes 355 seconds to simulate
DTU Informatics - Technical University of Denmark laustbn@acm.org http://www.imm.dtu.dk
