1,139 research outputs found

    Frontend frequency-voltage adaptation for optimal energy-delay/sup 2/

    Get PDF
    In this paper, we present a clustered, multiple-clock domain (CMCD) microarchitecture that combines the benefits of both clustering and globally asynchronous locally synchronous (GALS) designs. We also present a mechanism for dynamically adapting the frequency and voltage of the frontend of the CMCD with the goal to optimize the energy-delay/sup 2/ product (ED2P). Our mechanism has minimal hardware cost, is entirely self-adjustable, does not depend on any thresholds, and achieves results close to optimal. We evaluate it on 16 SPEC 2000 applications and report 17.5% ED2P reduction on average (80% of the upper bound).Peer ReviewedPostprint (published version

    On-Chip Transparent Wire Pipelining (invited paper)

    Get PDF
    Wire pipelining has been proposed as a viable mean to break the discrepancy between decreasing gate delays and increasing wire delays in deep-submicron technologies. Far from being a straightforwardly applicable technique, this methodology requires a number of design modifications in order to insert it seamlessly in the current design flow. In this paper we briefly survey the methods presented by other researchers in the field and then we thoroughly analyze the solutions we recently proposed, ranging from system-level wire pipelining to physical design aspects

    Memory bank predictors

    Get PDF
    Cache memories are commonly implemented through multiple memory banks to improve bandwidth and latency. The early knowledge of the data cache bank that an instruction will access can help to improve the performance in several ways. One scenario that is likely to become increasingly important is clustered microprocessors with a distributed cache. This work presents a study of different cache bank predictors. We show that effective bank predictors can be implemented with relatively low cost. For instance, a predictor of approximately 4 Kbytes is shown to achieve an average hit rate of 78% for SPECint2000 when used to predict accesses to an 8-bank cache memory in a contemporary superscalar processor. We also show how a predictor can be used to reduce the communication latency caused by memory accesses in a clustered microarchitecture with a distributed cache design.Peer ReviewedPostprint (published version

    Thermal-aware clustered microarchitectures

    Get PDF
    As frequencies and feature size scale faster than operating voltages, power density is increasing in each processor generation. Power density and the cost of removing the heat it generates are increasing at the same rate. Leakage is significantly increasing every process generation and it is expected to be the main source of power in the near future. Moreover, leakage power grows exponentially with temperature. This paper proposes and evaluates several techniques with two goals: reduction of average temperature in order to decrease leakage power, and reduction of peak temperature in order to reduce cooling cost. Combinations of temperature-aware steering techniques and cluster hopping are investigated in a quad-cluster superscalar microarchitecture. Combining cluster hopping with a temperature-aware steering policy results in 30% reduction in leakage power and 8% reduction in average peak temperature at the expense of a slowdown of just 5%.Peer ReviewedPostprint (published version

    Energy Saving Techniques for Phase Change Memory (PCM)

    Full text link
    In recent years, the energy consumption of computing systems has increased and a large fraction of this energy is consumed in main memory. Towards this, researchers have proposed use of non-volatile memory, such as phase change memory (PCM), which has low read latency and power; and nearly zero leakage power. However, the write latency and power of PCM are very high and this, along with limited write endurance of PCM present significant challenges in enabling wide-spread adoption of PCM. To address this, several architecture-level techniques have been proposed. In this report, we review several techniques to manage power consumption of PCM. We also classify these techniques based on their characteristics to provide insights into them. The aim of this work is encourage researchers to propose even better techniques for improving energy efficiency of PCM based main memory.Comment: Survey, phase change RAM (PCRAM

    Verification of timed circuits with failure directed abstractions

    Get PDF
    Journal ArticleThis paper presents a method to address state explosion in timed circuit verification by using abstraction directed by the failure model. This method allows us to decompose the verification problem into a set of subproblems, each of which proves that a specific failure condition does not occur. To each subproblem, abstraction is applied using safe transformations to reduce the complexity of verification. The abstraction preserves all essential behaviors conservatively for the specific failure model in the concrete description. Therefore, no violations of the given failure model are missed when only the abstract description is analyzed. An algorithm is also shown to examine the abstract error trace to either find a concrete error trace or report that it is a false negative. This paper presents results using the proposed failure directed abstractions as applied to two large timed circuit designs

    An Automated Design Flow for Approximate Circuits based on Reduced Precision Redundancy

    Get PDF
    Reduced Precision Redundancy (RPR) is a popular Approximate Computing technique, in which a circuit operated in Voltage Over-Scaling (VOS) is paired to a reduced-bitwidth and faster replica so that VOS-induced timing errors are partially recovered by the replica, and their impact is mitigated. Previous works have provided various examples of effective implementations of RPR, which however suffer from three limitations: first, these circuits are designed using ad-hoc procedures, and no generalization is provided; second, error impact analysis is carried out statistically, thus neglecting issues like non-elementary data distribution and temporal correlation. Last, only dynamic power was considered in the optimization. In this work we propose a new generalized approach to RPR that allows to overcome all these limitations, leveraging the capabilities of state-of-the-art synthesis and simulation tools. By sacrificing theoretical provability in favor of an empirical input-based analysis, we build a design tool able to automatically add RPR to a preexisting gate-level netlist. Thanks to this method, we are able to confute some of the conclusions drawn in previous works, in particular those related to statistical assumptions on inputs; we show that a given inputs distribution may yield extremely different results depending on their temporal behavior
    • 

    corecore