14 research outputs found

    Unified architecture level energy-efficiency metric

    No full text
    The development of power-efficient microprocessors presents the need to consider power consumption at early stages of design, particularly at the ISA and microarchitecture definition stages, where the potential for power savings is more significant than at lowerlevel stages, and the opportunity for making power-performance tradeoffs is the largest. Design modifications to the ISA and microarchitecture, however, affect most (if not all) parameters of the design, including architectural speed, code density, clocking rate and power. A reliable metric is required to make knowledgeable power-performance tradeoffs in this multi-dimensional space. This paper derives a unified energy-efficiency metric for evaluating ISA and microarchitecture features, which subsumes other commonly used power-performance metrics as special cases of a more general equation. This new metric is derived based on an analysis of a multi-dimensional power optimization problem, and the resulting formula involves only relative changes in the characteristics of a processor, enabling its application at the early stages of the design

    Design Methodology for Semi Custom Processor Cores ABSTRACT

    No full text
    We describe a semi-custom design methodology for embedded processor cores that was prototyped through the development of a low power high performance DSP core. When compared to the standard ASIC design flow, this methodology enables significant improvement in the speed and power; such benefits are obtained without compromising the generality and flexibility that characterizes the ASIC-based design techniques. Our methodology achieves fast turn-around time in the process from RTL description to post-PD timing results, and exhibits stable convergence on timing; these characteristics enable the application of optimizations spanning multiple levels of the design hierarchy. Such optimizations proved to be much more effective than those that focus only on a single design stage

    Interconnections in Multi-core Architectures: Understanding Mechanisms, Overheads and Scaling

    No full text
    This paper examines the area, power, performance, and design issues for the on-chip interconnects on a chip multiprocessor, attempting to present a comprehensive view of a class of interconnect architectures. It shows that the design choices for the interconnect have significant effect on the rest of the chip, potentially consuming a significant fraction of the real estate and power budget. This research shows that designs that treat interconnect as an entity that can be independently architected and optimized would not arrive at the best multicore design. Several examples are presented showing the need for careful co-design. For instance, increasing interconnect bandwidth requires area that then constrains the number of cores or cache sizes, and does not necessarily increase performance. Also, shared level-2 caches become significantly less attractive when the overhead of the resulting crossbar is accounted for. A hierarchical bus structure is examined which negates some of the performance costs of the assumed baseline architecture.

    Interconnections in multi-core architectures: Understanding mechanisms, overheads and scaling

    No full text
    This paper examines the area, power, performance, and design issues for the on-chip interconnects on a chip multiprocessor, attempting to present a comprehensive view of a class of interconnect architectures. It shows that the design choices for the interconnect have significant effect on the rest of the chip, potentially consuming a significant fraction of the real estate and power budget. This research shows that designs that treat interconnect as an entity that can be independently architected and optimized would not arrive at the best multicore design. Several examples are presented showing the need for careful co-design. For instance, increasing interconnect bandwidth requires area that then constrains the number of cores or cache sizes, and does not necessarily increase performance. Also, shared level-2 caches become significantly less attractive when the overhead of the resulting crossbar is accounted for. A hierarchical bus structure is examined which negates some of the performance costs of the assumed baseline architecture.

    Optimizing pipelines for power and performance

    No full text
    During the concept phase and definition of next generation high-end processors, power and performance will need to be weighted appropriately to deliver competitive cost/performance. It is not enough to adopt a CPI-centric view alone in early-stage definition studies. One of the fundamental issues confronting the architect at this stage is the choice of pipeline depth and target frequency. In this paper we present an optimization methodology that starts with an analytical power-performance model to derive optimal pipeline depth for a superscalar processor. The results are validated and further refined using detailed simulation based analysis. As part of the power-modeling methodology, we have developed equations that model the variation of energy as a function of pipeline depth. Our results using a set of SPEC2000 applications show that when both power and performance are considered for optimization, the optimal clock period is around 18 FO4. We also provide a detailed sensitivity analysis of the optimal pipeline depth against key assumptions of these energy models.
    corecore