273 research outputs found

    Vector processing-aware advanced clock-gating techniques for low-power fused multiply-add

    Get PDF
    The need for power efficiency is driving a rethink of design decisions in processor architectures. While vector processors succeeded in the high-performance market in the past, they need a retailoring for the mobile market that they are entering now. Floating-point (FP) fused multiply-add (FMA), being a functional unit with high power consumption, deserves special attention. Although clock gating is a well-known method to reduce switching power in synchronous designs, there are unexplored opportunities for its application to vector processors, especially when considering active operating mode. In this research, we comprehensively identify, propose, and evaluate the most suitable clock-gating techniques for vector FMA units (VFUs). These techniques ensure power savings without jeopardizing the timing. We evaluate the proposed techniques using both synthetic and “real-world” application-based benchmarking. Using vector masking and vector multilane-aware clock gating, we report power reductions of up to 52%, assuming active VFU operating at the peak performance. Among other findings, we observe that vector instruction-based clock-gating techniques achieve power savings for all vector FP instructions. Finally, when evaluating all techniques together, using “real-world” benchmarking, the power reductions are up to 80%. Additionally, in accordance with processor design trends, we perform this research in a fully parameterizable and automated fashion.The research leading to these results has received funding from the RoMoL ERC Advanced Grant GA 321253 and is supported in part by the European Union (FEDER funds) under contract TTIN2015-65316-P. The work of I. Ratkovic was supported by a FPU research grant from the Spanish MECD.Peer ReviewedPostprint (author's final draft

    RISC-V PROCESSOR PERFORMANCE ANALYSIS OF SECURE DESIGN PRINCIPLES

    Get PDF
    This project explores processor microarchitecture features that impact security and performance by conceptualizing and describing a RISC-V processor design with security as the priority.We begin by evaluating causes of several key classes of security vulnerabilities and then considering alternative architectures that address principal causes. We implemented portions of our design in SystemVerilog and demonstrated the functionality and performance of implemented features through simulation. Instantiation efforts are limited to microarchitecture design and writing register-transfer level (RTL) descriptions of the processor; formal verification, synthesis, and fabrication steps are specifically excluded.Specifically, we implemented a single-core RISC-V processor with a modified Harvard architecture for improved isolation of memory resources between privilege levels. Our implementation also mitigates side-channel attacks by avoiding data-dependent timing and adding power obfuscating features. We found that these changes reduced IPC performance by 55%, due to the increased impact of memory latency while eliminating most security vulnerabilities due to cache timing, branch prediction, and power analysis.Approved for public release. Distribution is unlimited.Captain, United States Marine CorpsNCWD

    Implementing Legba: Fine-Grained Memory Protection

    Get PDF
    Fine-grained hardware protection could provide a powerful and effective means for isolating untrusted code. However, previous techniques for providing fine-grained protection in hardware have lead to poor performance. Legba has been proposed as a new caching architecture, designed to reduce the granularity of protection, without slowing down the processor. Unfortunately, the designers of Legba have not attempted an implementation. Instead, all of their analysis is based purely on simulations. We present an implementation of the Legba design on a MIPS Core Processor, along with an analysis of our observations and results

    Coupling Latency-Insensitivity with Variable-Latency for Better Than Worst Case Design: A RISC Case Study

    Get PDF
    The gap between worst and typical case delays is bound to increase in nanometer scale technologies due to the spread in process manufacturing parameters. To still profit from scaling, designs should tolerate worst case delays seamlessly and with a minimum performance degradation with respect to the typical case. We present a simple RISC core which tolerates worst case extra latency using the Latency-Insensitive Design approach coupled to a Variable-Latency mechanism. Stalls caused by excessive delay, by data and control hazards and by late memory access are dealt with in a uniform way. Compared to a pure worst-case approach, our design method permits to increase the core clock frequency by 23% in a 45 nm CMOS technology, without area and power penalty

    Design methodology and productivity improvement in high speed VLSI circuits

    Get PDF
    2017 Spring.Includes bibliographical references.To view the abstract, please see the full text of the document

    Automatic synthesis of reconfigurable instruction set accelerators

    Get PDF
    corecore