1,934 research outputs found

    HALLS: An Energy-Efficient Highly Adaptable Last Level STT-RAM Cache for Multicore Systems

    Get PDF
    Spin-Transfer Torque RAM (STT-RAM) is widely considered a promising alternative to SRAM in the memory hierarchy due to STT-RAM's non-volatility, low leakage power, high density, and fast read speed. The STT-RAM's small feature size is particularly desirable for the last-level cache (LLC), which typically consumes a large area of silicon die. However, long write latency and high write energy still remain challenges of implementing STT-RAMs in the CPU cache. An increasingly popular method for addressing this challenge involves trading off the non-volatility for reduced write speed and write energy by relaxing the STT-RAM's data retention time. However, in order to maximize energy saving potential, the cache configurations, including STT-RAM's retention time, must be dynamically adapted to executing applications' variable memory needs. In this paper, we propose a highly adaptable last level STT-RAM cache (HALLS) that allows the LLC configurations and retention time to be adapted to applications' runtime execution requirements. We also propose low-overhead runtime tuning algorithms to dynamically determine the best (lowest energy) cache configurations and retention times for executing applications. Compared to prior work, HALLS reduced the average energy consumption by 60.57% in a quad-core system, while introducing marginal latency overhead.Comment: To Appear on IEEE Transactions on Computers (TC

    Empowering parallel computing with field programmable gate arrays

    Get PDF
    After more than 30 years, reconfigurable computing has grown from a concept to a mature field of science and technology. The cornerstone of this evolution is the field programmable gate array, a building block enabling the configuration of a custom hardware architecture. The departure from static von Neumannlike architectures opens the way to eliminate the instruction overhead and to optimize the execution speed and power consumption. FPGAs now live in a growing ecosystem of development tools, enabling software programmers to map algorithms directly onto hardware. Applications abound in many directions, including data centers, IoT, AI, image processing and space exploration. The increasing success of FPGAs is largely due to an improved toolchain with solid high-level synthesis support as well as a better integration with processor and memory systems. On the other hand, long compile times and complex design exploration remain areas for improvement. In this paper we address the evolution of FPGAs towards advanced multi-functional accelerators, discuss different programming models and their HLS language implementations, as well as high-performance tuning of FPGAs integrated into a heterogeneous platform. We pinpoint fallacies and pitfalls, and identify opportunities for language enhancements and architectural refinements

    SACR: Scheduling-Aware Cache Reconfiguration for Real-Time Embedded Systems

    Full text link
    Dynamic reconfiguration techniques are widely used for efficient system optimization. Dynamic cache reconfiguration is a promising approach for reducing energy consumption as well as for improving overall system performance. It is a major challenge to introduce cache reconfiguration into real-time embedded systems since dynamic analysis may adversely affect tasks with real-time constraints. This paper presents a novel approach for implementing cache reconfiguration in soft real-time systems by efficiently leveraging static analysis during execution to both minimize energy and maximize performance. To the best of our knowledge, this is the first attempt to integrate dynamic cache reconfiguration in real-time scheduling techniques. Our experimental results using a wide variety of applications have demonstrated that our approach can significantly (up to 74%) reduce the overall energy consumption of the cache hierarchy in soft real-time systems. 1

    A low-power cache system for high-performance processors

    Get PDF
    制度:新 ; 報告番号:甲3439号 ; 学位の種類:博士(工学) ; 授与年月日:12-Sep-11 ; 早大学位記番号:新576

    BRISC-V: An Open-Source Architecture Design Space Exploration Toolbox

    Full text link
    In this work, we introduce a platform for register-transfer level (RTL) architecture design space exploration. The platform is an open-source, parameterized, synthesizable set of RTL modules for designing RISC-V based single and multi-core architecture systems. The platform is designed with a high degree of modularity. It provides highly-parameterized, composable RTL modules for fast and accurate exploration of different RISC-V based core complexities, multi-level caching and memory organizations, system topologies, router architectures, and routing schemes. The platform can be used for both RTL simulation and FPGA based emulation. The hardware modules are implemented in synthesizable Verilog using no vendor-specific blocks. The platform includes a RISC-V compiler toolchain to assist in developing software for the cores, a web-based system configuration graphical user interface (GUI) and a web-based RISC-V assembly simulator. The platform supports a myriad of RISC-V architectures, ranging from a simple single cycle processor to a multi-core SoC with a complex memory hierarchy and a network-on-chip. The modules are designed to support incremental additions and modifications. The interfaces between components are particularly designed to allow parts of the processor such as whole cache modules, cores or individual pipeline stages, to be modified or replaced without impacting the rest of the system. The platform allows researchers to quickly instantiate complete working RISC-V multi-core systems with synthesizable RTL and make targeted modifications to fit their needs. The complete platform (including Verilog source code) can be downloaded at https://ascslab.org/research/briscv/explorer/explorer.html.Comment: In Proceedings of the 2019 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA '19
    corecore