1,934 research outputs found
HALLS: An Energy-Efficient Highly Adaptable Last Level STT-RAM Cache for Multicore Systems
Spin-Transfer Torque RAM (STT-RAM) is widely considered a promising
alternative to SRAM in the memory hierarchy due to STT-RAM's non-volatility,
low leakage power, high density, and fast read speed. The STT-RAM's small
feature size is particularly desirable for the last-level cache (LLC), which
typically consumes a large area of silicon die. However, long write latency and
high write energy still remain challenges of implementing STT-RAMs in the CPU
cache. An increasingly popular method for addressing this challenge involves
trading off the non-volatility for reduced write speed and write energy by
relaxing the STT-RAM's data retention time. However, in order to maximize
energy saving potential, the cache configurations, including STT-RAM's
retention time, must be dynamically adapted to executing applications' variable
memory needs. In this paper, we propose a highly adaptable last level STT-RAM
cache (HALLS) that allows the LLC configurations and retention time to be
adapted to applications' runtime execution requirements. We also propose
low-overhead runtime tuning algorithms to dynamically determine the best
(lowest energy) cache configurations and retention times for executing
applications. Compared to prior work, HALLS reduced the average energy
consumption by 60.57% in a quad-core system, while introducing marginal latency
overhead.Comment: To Appear on IEEE Transactions on Computers (TC
Empowering parallel computing with field programmable gate arrays
After more than 30 years, reconfigurable computing has grown from a concept to a mature field of science and technology. The cornerstone of this evolution is the field programmable gate array, a building block enabling the configuration of a custom hardware architecture. The departure from static von Neumannlike architectures opens the way to eliminate the instruction overhead and to optimize the execution speed and power consumption. FPGAs now live in a growing ecosystem of development tools, enabling software programmers to map algorithms directly onto hardware. Applications abound in many directions, including data centers, IoT, AI, image processing and space exploration. The increasing success of FPGAs is largely due to an improved toolchain with solid high-level synthesis support as well as a better integration with processor and memory systems. On the other hand, long compile times and complex design exploration remain areas for improvement. In this paper we address the evolution of FPGAs towards advanced multi-functional accelerators, discuss different programming models and their HLS language implementations, as well as high-performance tuning of FPGAs integrated into a heterogeneous platform. We pinpoint fallacies and pitfalls, and identify opportunities for language enhancements and architectural refinements
SACR: Scheduling-Aware Cache Reconfiguration for Real-Time Embedded Systems
Dynamic reconfiguration techniques are widely used for efficient system optimization. Dynamic cache reconfiguration is a promising approach for reducing energy consumption as well as for improving overall system performance. It is a major challenge to introduce cache reconfiguration into real-time embedded systems since dynamic analysis may adversely affect tasks with real-time constraints. This paper presents a novel approach for implementing cache reconfiguration in soft real-time systems by efficiently leveraging static analysis during execution to both minimize energy and maximize performance. To the best of our knowledge, this is the first attempt to integrate dynamic cache reconfiguration in real-time scheduling techniques. Our experimental results using a wide variety of applications have demonstrated that our approach can significantly (up to 74%) reduce the overall energy consumption of the cache hierarchy in soft real-time systems. 1
A low-power cache system for high-performance processors
制度:新 ; 報告番号:甲3439号 ; 学位の種類:博士(工学) ; 授与年月日:12-Sep-11 ; 早大学位記番号:新576
BRISC-V: An Open-Source Architecture Design Space Exploration Toolbox
In this work, we introduce a platform for register-transfer level (RTL)
architecture design space exploration. The platform is an open-source,
parameterized, synthesizable set of RTL modules for designing RISC-V based
single and multi-core architecture systems. The platform is designed with a
high degree of modularity. It provides highly-parameterized, composable RTL
modules for fast and accurate exploration of different RISC-V based core
complexities, multi-level caching and memory organizations, system topologies,
router architectures, and routing schemes. The platform can be used for both
RTL simulation and FPGA based emulation. The hardware modules are implemented
in synthesizable Verilog using no vendor-specific blocks. The platform includes
a RISC-V compiler toolchain to assist in developing software for the cores, a
web-based system configuration graphical user interface (GUI) and a web-based
RISC-V assembly simulator. The platform supports a myriad of RISC-V
architectures, ranging from a simple single cycle processor to a multi-core SoC
with a complex memory hierarchy and a network-on-chip. The modules are designed
to support incremental additions and modifications. The interfaces between
components are particularly designed to allow parts of the processor such as
whole cache modules, cores or individual pipeline stages, to be modified or
replaced without impacting the rest of the system. The platform allows
researchers to quickly instantiate complete working RISC-V multi-core systems
with synthesizable RTL and make targeted modifications to fit their needs. The
complete platform (including Verilog source code) can be downloaded at
https://ascslab.org/research/briscv/explorer/explorer.html.Comment: In Proceedings of the 2019 ACM/SIGDA International Symposium on
Field-Programmable Gate Arrays (FPGA '19
- …