10 research outputs found

    Quantitative Characterization of the Software Layer of a HW/SW Co-Designed Processor

    Get PDF
    HW/SW co-designed processors currently have a renewed interest due to their capability to boost performance without running into the power and complexity walls. By employing a software layer that performs dynamic binary translation and applies aggressive optimizations through exploiting the runtime application behavior, these hybrid architectures provide better performance/watt. However, a poorly designed software layer can result in significant translation/optimization overheads that may offset its benefits. This work presents a detailed characterization of the software layer of a HW/SW co-designed processor using a variety of benchmark suites. We observe that the performance of the software layer is very sensitive to the characteristics of the emulated application with a variance of more than 50%. We also show that the interaction between the software layer and the emulated application, while sharing the microarchitectural resources, can have 0-20% impact on performance. Finally, we identify some key elements which should be further investigated to reduce the observed variations in performance. The paper provides critical insights to improve the software layer design.Peer ReviewedPostprint (author's final draft

    HW/SW Co-designed Processors: Challenges, Design Choices and a Simulation Infrastructure for Evaluation

    Get PDF
    Improving single thread performance is a key challenge in modern microprocessors especially because the traditional approach of increasing clock frequency and deep pipelining cannot be pushed further due to power constraints. Therefore, researchers have been looking at unconventional architectures to boost single thread performance without running into the power wall. HW/SW co-designed processors like Nvidia Denver, are emerging as a promising alternative. However, HW/SW co-designed processors need to address some key challenges such as startup delay, providing high performance with simple hardware, translation/optimization overhead, etc. before they can become mainstream. A fundamental requirement for evaluating different design choices and trade-offs to meet these challenges is to have a simulation infrastructure. Unfortunately, there is no such infrastructure available today. Building the aforementioned infrastructure itself poses significant challenges as it encompasses the complexities of not only an architectural framework but also of a compilation one. This paper identifies the key challenges that HW/SW codesigned processors face and the basic requirements for a simulation infrastructure targeting these architectures. Furthermore, the paper presents DARCO, a simulation infrastructure to enable research in this domain.Peer ReviewedPostprint (author's final draft

    DDGacc: boosting dynamic DDG-based binary optimizations through specialized hardware support

    No full text
    Dynamic Binary Translators (DBT) and Dynamic Binary Opti- mization (DBO) by software are used widely for several reasons including performance, design simplification and virtualization. However, the software layer in such systems introduces non- negligible overheads which affect performance and user experi- ence. Hence, reducing DBT/DBO overheads is of paramount im- portance. In addition, reduced overheads have interesting collateral effects in the rest of the software layer, such as allowing optimiza- tions to be applied earlier. A cost-effective solution to this problem is to provide hardware support to speed up the primitives of the software layer, paying special attention to automate DBT/DBO mechanisms and leave the heuristics to the software, which is more flexible. In this work, we have characterized the overheads of a DBO sys- tem using DynamoRIO implementing several basic optimizations. We have seen that the computation of the Data Dependence Graph (DDG) accounts for 5%-10% of the execution time. For this rea- son, we propose to add hardware support for this task in the form of a new functional unit, called DDGacc, which is integrated in a conventional pipeline processor and is operated through new ISA instructions. Our evaluation shows that DDGacc reduces the cost of computing the DDG by 32x, which reduces overall execution time by 5%-10% on average and up to 18% for applications where the DBO optimizes large code footprints.Peer ReviewedPostprint (published version

    HW/SW co-designed processors: Challenges, design choices and a simulation infrastructure for evaluation

    No full text
    Improving single thread performance is a key challenge in modern microprocessors especially because the traditional approach of increasing clock frequency and deep pipelining cannot be pushed further due to power constraints. Therefore, researchers have been looking at unconventional architectures to boost single thread performance without running into the power wall. HW/SW co-designed processors like Nvidia Denver, are emerging as a promising alternative. However, HW/SW co-designed processors need to address some key challenges such as startup delay, providing high performance with simple hardware, translation/optimization overhead, etc. before they can become mainstream. A fundamental requirement for evaluating different design choices and trade-offs to meet these challenges is to have a simulation infrastructure. Unfortunately, there is no such infrastructure available today. Building the aforementioned infrastructure itself poses significant challenges as it encompasses the complexities of not only an architectural framework but also of a compilation one. This paper identifies the key challenges that HW/SW codesigned processors face and the basic requirements for a simulation infrastructure targeting these architectures. Furthermore, the paper presents DARCO, a simulation infrastructure to enable research in this domain.Peer Reviewe

    Quantitative characterization of the software layer of a HW/SW co-designed processor

    No full text
    HW/SW co-designed processors currently have a renewed interest due to their capability to boost performance without running into the power and complexity walls. By employing a software layer that performs dynamic binary translation and applies aggressive optimizations through exploiting the runtime application behavior, these hybrid architectures provide better performance/watt. However, a poorly designed software layer can result in significant translation/optimization overheads that may offset its benefits. This work presents a detailed characterization of the software layer of a HW/SW co-designed processor using a variety of benchmark suites. We observe that the performance of the software layer is very sensitive to the characteristics of the emulated application with a variance of more than 50%. We also show that the interaction between the software layer and the emulated application, while sharing the microarchitectural resources, can have 0-20% impact on performance. Finally, we identify some key elements which should be further investigated to reduce the observed variations in performance. The paper provides critical insights to improve the software layer design.Peer Reviewe

    Modelling HW/SW Co-Designed Processors

    No full text
    This paper presents DARCO, an extensible platform for modelling HW/SW co-designed processors with different guest and host ISAs. Its Emulation Software Layer (ESL) provides staged compilation, which translates and optimizes x86 binaries to run on a PowerPC processor. In addition to the functional models, DARCO provides timing simulators and a powerful debugging toolchain. DARCO has a functional emulation speed of 8 million x86 instructions per second

    DDGacc

    No full text
    corecore