HW/SW Co-designed Processors: Challenges, Design Choices and a Simulation Infrastructure for Evaluation by Cano, José et al.
• Correctness
- It should not change program behavior
- Arch./memory states compared periodically
• Minimum TOL overhead
- TOL execution time must be small
- 3-stages translation/optimization, chaining
• Minimum emulation cost
- Host to guest instruction ratio must be low
- Aggressive/speculative optimizations
• Support for multiple guest ISAs (front-ends)
- User Æ appÆ ISA Æ device
- Incorporating additional front-ends is simple
• Plug and play support
- Easy to include/evaluate new features
- Modular design
• Debugging
- Strong debug toolchain
- Debug mechanism activated if mismatch detected
HW/SW Co-designed Processors:
Challenges, Design Choices and a Simulation Infrastructure for Evaluation
José Cano1, Rakesh Kumar2, Aleksandar Brankovic3, Demos Pavlou4, Kyriakos Stavrou4, Enric Gibert5, Alejandro Martínez6, Antonio González7
1University of Edinburgh, UK    2Uppsala University, Sweden 3Intel    411pets    5Pharmacelera    6ARM    7Universitat Politècnica de Catalunya, Spain
The problem
DARCO: Evaluation
ARM Research Summit, Cambridge, UK - September 11-13, 2017
DARCO: Simulation infrastructure
Conclusions
• These processors need to address some key challenges before they can 
become mainstream
• There are no simulation infrastructures for evaluating different design 
choices and trade-offs to meet these challenges
x86 Component 
x86 Binary 





Data and  
Instruction Path 





 x86  



























BB translate Yes 
Store in Code $ 
Chain 













Challenges in building an infrastructure vs DARCO
• HW/SW co-designed processors
- Potential to improve energy efficiency and performance
- Several industrial projects, no major project in academia
• Challenges
- To become mainstream (e.g. startup delay)
- To build a simulation infrastructure (e.g. software layer overhead)
• DARCO
- May enable academic research in HW/SW co-designed domain




• Where to implement (HW or SW) microarchitectural features
- Instruction decoding/reordering, register renaming, memory 
disambiguation, … 
• How to reduce “startup delay”
- One of the major problems of Transmeta processors
• When and where to translate/optimize the guest binaries
- As soon as code becomes “hot”?
• How to address speculative execution (memory, control)
- Checkpointing granularity?
• When and how to profile the execution









































































































































































































































• HW/SW co-designed processors
–Lot of potential to improve 
• Performance (DBTO in SW)
• Energy efficiency (simple HW) 
–Some projects from industry
• IBM DAISY/BOA, Transmeta
Crusoe/Efficeon, NVIDIA Denver
• But no successful product yet
–No major project from academia
