10 research outputs found
Quantitative Characterization of the Software Layer of a HW/SW Co-Designed Processor
HW/SW co-designed processors currently have a renewed interest due to their capability to boost
performance without running into the power and complexity walls. By employing a software layer that performs dynamic binary translation and applies aggressive optimizations through exploiting the runtime application behavior, these hybrid architectures provide better performance/watt. However, a poorly designed software layer can result in significant translation/optimization overheads that may offset its benefits. This work presents a detailed characterization of the software layer of a HW/SW co-designed processor using a variety of benchmark suites. We observe that the performance of the software layer is very sensitive to the characteristics of the emulated application with a variance
of more than 50%. We also show that the interaction between the software layer and the emulated application, while sharing the microarchitectural resources, can have 0-20% impact on performance. Finally, we identify some key elements which should be further investigated to reduce the observed variations in performance. The paper provides critical insights to improve the software layer design.Peer ReviewedPostprint (author's final draft
HW/SW Co-designed Processors: Challenges, Design Choices and a Simulation Infrastructure for Evaluation
Improving single thread performance is a key challenge in modern microprocessors especially because the traditional approach of increasing clock frequency and deep pipelining cannot be pushed further due to power constraints. Therefore, researchers have been looking at unconventional architectures to boost single thread performance without running into the power wall. HW/SW co-designed processors like Nvidia Denver, are emerging as a promising alternative. However, HW/SW co-designed processors need to address some key challenges such as startup delay, providing high performance with simple hardware, translation/optimization overhead, etc. before they can become mainstream. A fundamental requirement for evaluating different design choices and trade-offs to meet these challenges is to have a simulation infrastructure. Unfortunately, there is no such infrastructure available today. Building the aforementioned infrastructure itself poses significant challenges as it encompasses the complexities of not only an architectural framework but also of a compilation one. This paper identifies the key challenges that HW/SW codesigned processors face and the basic requirements for a simulation infrastructure targeting these architectures. Furthermore, the paper presents DARCO, a simulation infrastructure to enable research in this domain.Peer ReviewedPostprint (author's final draft
DDGacc: boosting dynamic DDG-based binary optimizations through specialized hardware support
Dynamic Binary Translators (DBT) and Dynamic Binary Opti-
mization (DBO) by software are used widely for several reasons
including performance, design simplification and virtualization.
However, the software layer in such systems introduces non-
negligible overheads which affect performance and user experi-
ence. Hence, reducing DBT/DBO overheads is of paramount im-
portance. In addition, reduced overheads have interesting collateral
effects in the rest of the software layer, such as allowing optimiza-
tions to be applied earlier. A cost-effective solution to this problem
is to provide hardware support to speed up the primitives of the
software layer, paying special attention to automate DBT/DBO
mechanisms and leave the heuristics to the software, which is more
flexible.
In this work, we have characterized the overheads of a DBO sys-
tem using DynamoRIO implementing several basic optimizations.
We have seen that the computation of the Data Dependence Graph
(DDG) accounts for 5%-10% of the execution time. For this rea-
son, we propose to add hardware support for this task in the form
of a new functional unit, called DDGacc, which is integrated in a
conventional pipeline processor and is operated through new ISA
instructions. Our evaluation shows that DDGacc reduces the cost of
computing the DDG by 32x, which reduces overall execution time
by 5%-10% on average and up to 18% for applications where the
DBO optimizes large code footprints.Peer ReviewedPostprint (published version
HW/SW co-designed processors: Challenges, design choices and a simulation infrastructure for evaluation
Improving single thread performance is a key challenge in modern microprocessors especially because the traditional approach of increasing clock frequency and deep pipelining cannot be pushed further due to power constraints. Therefore, researchers have been looking at unconventional architectures to boost single thread performance without running into the power wall. HW/SW co-designed processors like Nvidia Denver, are emerging as a promising alternative. However, HW/SW co-designed processors need to address some key challenges such as startup delay, providing high performance with simple hardware, translation/optimization overhead, etc. before they can become mainstream. A fundamental requirement for evaluating different design choices and trade-offs to meet these challenges is to have a simulation infrastructure. Unfortunately, there is no such infrastructure available today. Building the aforementioned infrastructure itself poses significant challenges as it encompasses the complexities of not only an architectural framework but also of a compilation one. This paper identifies the key challenges that HW/SW codesigned processors face and the basic requirements for a simulation infrastructure targeting these architectures. Furthermore, the paper presents DARCO, a simulation infrastructure to enable research in this domain.Peer Reviewe
Quantitative characterization of the software layer of a HW/SW co-designed processor
HW/SW co-designed processors currently have a renewed interest due to their capability to boost
performance without running into the power and complexity walls. By employing a software layer that performs dynamic binary translation and applies aggressive optimizations through exploiting the runtime application behavior, these hybrid architectures provide better performance/watt. However, a poorly designed software layer can result in significant translation/optimization overheads that may offset its benefits. This work presents a detailed characterization of the software layer of a HW/SW co-designed processor using a variety of benchmark suites. We observe that the performance of the software layer is very sensitive to the characteristics of the emulated application with a variance
of more than 50%. We also show that the interaction between the software layer and the emulated application, while sharing the microarchitectural resources, can have 0-20% impact on performance. Finally, we identify some key elements which should be further investigated to reduce the observed variations in performance. The paper provides critical insights to improve the software layer design.Peer Reviewe
Modelling HW/SW Co-Designed Processors
This paper presents DARCO, an extensible platform for modelling HW/SW co-designed processors with different guest and host ISAs. Its Emulation Software Layer (ESL) provides staged
compilation, which translates and optimizes x86 binaries to run on a PowerPC processor. In addition to the functional models, DARCO provides timing simulators and a powerful debugging
toolchain. DARCO has a functional emulation speed of 8 million x86 instructions per second