559 research outputs found

    Cross-Layer Approaches for an Aging-Aware Design of Nanoscale Microprocessors

    Get PDF
    Thanks to aggressive scaling of transistor dimensions, computers have revolutionized our life. However, the increasing unreliability of devices fabricated in nanoscale technologies emerged as a major threat for the future success of computers. In particular, accelerated transistor aging is of great importance, as it reduces the lifetime of digital systems. This thesis addresses this challenge by proposing new methods to model, analyze and mitigate aging at microarchitecture-level and above

    Fine-grained Energy and Thermal Management using Real-time Power Sensors

    Get PDF
    With extensive use of battery powered devices such as smartphones, laptops an

    A survey of emerging architectural techniques for improving cache energy consumption

    Get PDF
    The search goes on for another ground breaking phenomenon to reduce the ever-increasing disparity between the CPU performance and storage. There are encouraging breakthroughs in enhancing CPU performance through fabrication technologies and changes in chip designs but not as much luck has been struck with regards to the computer storage resulting in material negative system performance. A lot of research effort has been put on finding techniques that can improve the energy efficiency of cache architectures. This work is a survey of energy saving techniques which are grouped on whether they save the dynamic energy, leakage energy or both. Needless to mention, the aim of this work is to compile a quick reference guide of energy saving techniques from 2013 to 2016 for engineers, researchers and students

    Coarse-Grained Online Monitoring of BTI Aging by Reusing Power-Gating Infrastructure

    Get PDF
    In this paper, we present a novel coarse-grained technique for monitoring online the bias temperature instability (BTI) aging of circuits by exploiting their power gating infrastructure. The proposed technique relies on monitoring the discharge time of the virtual-power-network during standby operations, the value of which depends on the threshold voltage of the CMOS devices in a power-gated design (PGD). It does not require any distributed sensors, because the virtual-power-network is already distributed in a PGD. It consists of a hardware block for measuring the discharge time concurrently with normal standby operations and a processing block for estimating the BTI aging status of the PGD according to collected measurements. Through SPICE simulation, we demonstrate that the BTI aging estimation error of the proposed technique is less than 1% and 6.2% for PGDs with static operating frequency and dynamic voltage and frequency scaling, respectively. Its area cost is also found negligible. The power gating minimum idle time (MIT) cost induced by the energy consumed for monitoring the discharge time is evaluated on two scalar machine models using either x86 or ARM instruction sets. It is found less than 1.3× and 1.45× the original power gating MIT, respectively. We validate the proposed technique through accelerated aging experiments conducted with five actual chips that contain an ARM cortex M0 processor, manufactured with a 65 nm CMOS technology

    高電力効率プロセッサのためのキャッシュの設計最適化

    Get PDF
    学位の種別: 課程博士審査委員会委員 : (主査)東京大学教授 中村 宏, 東京大学教授 原 辰次, 東京大学教授 石川 正俊, 東京大学准教授 近藤 正章, 東京大学准教授 品川 高廣, 東京大学准教授 入江 英嗣University of Tokyo(東京大学

    Optimizing SIMD execution in HW/SW co-designed processors

    Get PDF
    SIMD accelerators are ubiquitous in microprocessors from different computing domains. Their high compute power and hardware simplicity improve overall performance in an energy efficient manner. Moreover, their replicated functional units and simple control mechanism make them amenable to scaling to higher vector lengths. However, code generation for these accelerators has been a challenge from the days of their inception. Compilers generate vector code conservatively to ensure correctness. As a result they lose significant vectorization opportunities and fail to extract maximum benefits out of SIMD accelerators. This thesis proposes to vectorize the program binary at runtime in a speculative manner, in addition to the compile time static vectorization. There are different environments that support runtime profiling and optimization support required for dynamic vectorization, one of most prominent ones being: 1) Dynamic Binary Translators and Optimizers (DBTO) and 2) Hardware/Software (HW/SW) Co-designed Processors. HW/SW co-designed environment provides several advantages over DBTOs like transparent incorporations of new hardware features, binary compatibility, etc. Therefore, we use HW/SW co-designed environment to assess the potential of speculative dynamic vectorization. Furthermore, we analyze vector code generation for wider vector units and find out that even though SIMD accelerators are amenable to scaling from the hardware point of view, vector code generation at higher vector length is even more challenging. The two major factors impeding vectorization for wider SIMD units are: 1) Reduced dynamic instruction stream coverage for vectorization and 2) Large number of permutation instructions. To solve the first problem we propose Variable Length Vectorization that iteratively vectorizes for multiple vector lengths to improve dynamic instruction stream coverage. Secondly, to reduce the number of permutation instructions we propose Selective Writing that selectively writes to different parts of a vector register and avoids permutations. Finally, we tackle the problem of leakage energy in SIMD accelerators. Since SIMD accelerators consume significant amount of real estate on the chip, they become the principle source of leakage if not utilized judiciously. Power gating is one of the most widely used techniques to reduce leakage energy of functional units. However, power gating has its own energy and performance overhead associated with it. We propose to selectively devectorize the vector code when higher SIMD lanes are used intermittently. This selective devectorization keeps the higher SIMD lanes idle and power gated for maximum duration. Therefore, resulting in overall leakage energy reduction.Postprint (published version

    Chapter One – An Overview of Architecture-Level Power- and Energy-Efficient Design Techniques

    Get PDF
    Power dissipation and energy consumption became the primary design constraint for almost all computer systems in the last 15 years. Both computer architects and circuit designers intent to reduce power and energy (without a performance degradation) at all design levels, as it is currently the main obstacle to continue with further scaling according to Moore's law. The aim of this survey is to provide a comprehensive overview of power- and energy-efficient “state-of-the-art” techniques. We classify techniques by component where they apply to, which is the most natural way from a designer point of view. We further divide the techniques by the component of power/energy they optimize (static or dynamic), covering in that way complete low-power design flow at the architectural level. At the end, we conclude that only a holistic approach that assumes optimizations at all design levels can lead to significant savings.Peer ReviewedPostprint (published version
    corecore