89 research outputs found

    Circuit Design, Architecture and CAD for RRAM-based FPGAs

    Get PDF
    Field Programmable Gate Arrays (FPGAs) have been indispensable components of embedded systems and datacenter infrastructures. However, energy efficiency of FPGAs has become a hard barrier preventing their expansion to more application contexts, due to two physical limitations: (1) The massive usage of routing multiplexers causes delay and power overheads as compared to ASICs. To reduce their power consumption, FPGAs have to operate at low supply voltage but sacrifice performance because the transistors drive degrade when working voltage decreases. (2) Using volatile memory technology forces FPGAs to lose configurations when powered off and to be reconfigured at each power on. Resistive Random Access Memories (RRAMs) have strong potentials in overcoming the physical limitations of conventional FPGAs. First of all, RRAMs grant FPGAs non-volatility, enabling FPGAs to be "Normally powered off, Instantly powered on". Second, by combining functionality of memory and pass-gate logic in one unique device, RRAMs can greatly reduce area and delay of routing elements. Third, when RRAMs are embedded into datpaths, the performance of circuits can be independent from their working voltage, beyond the limitations of CMOS circuits. However, researches and development of RRAM-based FPGAs are in their infancy. Most of area and performance predictions were achieved without solid circuit-level simulations and sophisticated Computer Aided Design (CAD) tools, causing the predicted improvements to be less convincing. In this thesis,we present high-performance and low-power RRAM-based FPGAs fromtransistorlevel circuit designs to architecture-level optimizations and CAD tools, using theoretical analysis, industrial electrical simulators and novel CAD tools. We believe that this is the first systematic study in the field, covering: From a circuit design perspective, we propose efficient RRAM-based programming circuits and routing multiplexers through both theoretical analysis and electrical simulations. The proposed 4T(ransitor)1R(RAM) programming structure demonstrates significant improvements in programming current, when compared to most popular 2T1R programming structure. 4T1R-based routingmultiplexer designs are proposed by considering various physical design parasitics, such as intrinsic capacitance of RRAMs and wells doping organization. The proposed 4T1R-based multiplexers outperformbest CMOS implementations significantly in area, delay and power at both nominal and near-Vt regime. From a CAD perspective, we develop a generic FPGA architecture exploration tool, FPGASPICE, modeling a full FPGA fabric with SPICE and Verilog netlists. FPGA-SPICE provides different levels of testbenches and techniques to split large SPICE netlists, in order to obtain better trade-off between simulation time and accuracy. FPGA-SPICE can capture area and power characteristics of SRAM-based and RRAM-based FPGAs more accurately than the currently best analyticalmodels. From an architecture perspective, we propose architecture-level optimizations for RRAMbased FPGAs and quantify their minimumrequirements for RRAM devices. Compared to the best SRAM-based FPGAs, an optimized RRAM-based FPGA architecture brings significant reduction in area, delay and power respectively. In particular, RRAM-based FPGAs operating in the near-Vt regime demonstrate a 5x power improvement without delay overhead as compared to optimized SRAM-based FPGA operating at nominal working voltage

    Optimization Opportunities in RRAM-based FPGA Architectures

    Get PDF
    Static Random Access Memory (SRAM)-based routing multiplexers, whatever structure is employed, share a common limitation: their area, delay and power increase linearly with the input size. This property results in most SRAM-based FPGA architectures typically avoiding the use of large multiplexers. Resistive Random Access Memory (RRAM)- based multiplexers, built with one-level structure, have a unique advantage over SRAM-based multiplexers: their ideal delay is independent from the input size. This property allows RRAM-based FPGA architectures to use larger multiplexers than their SRAM-based counterparts, without generating any delay overhead. In this paper, by carefully considering the properties of RRAM multiplexers, we assess that current state-of-art architectural parameters for SRAM-based FPGAs cannot preserve optimality in the context of RRAM-based FPGAs. As a result, we propose that in RRAM-based FPGAs, (a) the routing tracks should be interconnected to Look-Up Table (LUT) inputs via a one-level crossbar, instead of through Connection Blocks and local routing; (b) the Switch Blocks should employ larger multiplexers; (c) length-2 wires should be used instead of length-4 wires. When operated in nominal voltage, the proposed RRAM-based FPGA architecture reduces area by 26%, delay by 39% and channel width by 13%, as compared to a SRAM-based FPGA with a classical architecture. When operated in the near-Vt regime, the proposed RRAM-based FPGA architecture improves Area-Delay Product by 42% and Power-Delay Product by 5x as compared to a classical SRAM-based FPGA at nominal voltage

    A Ultra-Low-Power FPGA Based on Monolithically Integrated RRAMs (invited)

    Get PDF
    Field Programmable Gate Arrays (FPGAs) rely heavily on complex routing architectures. The routing structures use programmable switches and account for a significant share in the total area, delay and power consumption numbers. With the ability of being monolithically integrated with CMOS chips, Resistive Random Access Memories (RRAMs) enable high-performance routing architectures through the replacement of Static Random Access Memory (SRAM)-based programming switches. Exploiting the very low on-resistance state achievable by RRAMs as well as the improved tolerance to power supply reduction, RRAM-based routing multiplexers can be used to significantly reduce the power consumption of FPGA systems with no performance compromises. By evaluating the opportunities of ultra-low-power RRAM-based FPGAs at the system level, we see an improvement of 12%, 26% and 81% in area, delay and power consumption at a mature technology node

    Accurate Power Analysis for Near-Vt RRAM-based FPGA

    Get PDF
    Resistive Random Access Memory (RRAM)-based FPGA architectures employ RRAMs not only as memories to store the configuration but embed them in the datapaths of programmable routing resources to propagate signals with improved performances. Sources of power consumption have been intensively studied for conventional Static Random Access Memories (SRAM)-based FPGAs. However, very limited works focused so far on studying the power characteristics of RRAM-based FPGAs. In this paper, we first analyze the power characteristics of RRAM-based multiplexer at circuit level and then use electrical simulations to study power consumption of RRAM-based FPGA architectures. Experimental results show that RRAM-based FPGAs achieve a Power-Delay Product reduced by 50% compared to SRAM-based FPGA at nominal voltage and 20% compared to near-Vt SRAM-based FPGA, respectively

    A high-performance low-power near-Vt RRAM-based FPGA

    Full text link

    A Study on Buffer Distribution for RRAM-based FPGA Routing Structures

    Get PDF
    Compared to Application-Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) provide reconfigurablity at the cost of lower performance and higher power consumption. Exploiting a large number of programmable switches, routing structures are mainly responsible for the performance limitation. Hence, employing more efficient switches can drastically improve the performance and reduce the power consumption of the FPGA. Resistive Random Access Memory (RRAM)-based switches are one of the most promising candidates to improve the FPGA routing architecture thanks to their low on-resistance and non-volatility. The lower RC delay of RRAM-based routing multiplexers, as compared to CMOS-based routing structures encourages us to reconsider the buffer distribution in FPGAs. This paper proposes an approach to reduce the number of buffers in the routing path of RRAM-based FPGAs. Our architectural simulations show that the use of RRAM switches improves the critical path delay by 56% as compared to CMOS switches in standard FPGA circuits at 45-nm technology node while, at the same time, the area and power are reduced, respectively, by 17% and 9%. By adapting the buffering scheme, an extra bonus of 9% for delay reduction, 5% for power reduction and 16% for area reduction can be obtained, as compared to the conventional buffering approach for RRAM-based FPGAs

    Via-switch FPGA with transistor-free programmability enabling energy-efficient near-memory parallel computation

    Get PDF
    We are developing field-programmable gate arrays (FPGAs) with a new non-volatile switch called via-switch. In via-switch FPGAs (VS-FPGAs), the via-switches required for reconfiguration are placed in the routing layer so that the entire transistor layer can be utilized for computing, and higher implementation density can be achieved compared to conventional SRAM FPGAs. Furthermore, since arithmetic units and memories for computing can be placed under the via-switch crossbar for routing, large-scale parallel operations can be realized where the memory and the arithmetic unit are adjacent to each other. These features enable operation with high energy efficiency. This article reports 65 nm prototype fabrication results and predicted the performance of the VS-FPGA designed for AI applications. We also present the developed application mapping flow and crossbar programming method. The VS-FPGA closes the gap between FPGA and application-specific integrated circuits (ASIC) with the performance advantage of the via-switch and via-switch copy scheme for FPGA-to-ASIC migration, contributing to the expansion of the FPGA usage

    GMS: Generic Memristive Structure for Non-Volatile FPGAs

    Get PDF
    The invention of the memristor enables new possibilities for computation and non-volatile memory storage. In this paper we propose a Generic Memristive Structure (GMS) for 3-D FPGA applications. The GMS cell is demonstrated to be utilized for steering logic useful for multiplexing signals, thus replacing the traditional pass-gates in FPGAs. Moreover, the same GMS cell can be utilized for programmable memories as a replacement for the SRAMs employed in the look-up tables of FPGAs. A fabricated GMS cell is presented and its use in FPGA architecture is demonstrated by the area and delay improvement for several architectural benchmarks

    Non-Volatile Memory Array Based Quantization- and Noise-Resilient LSTM Neural Networks

    Full text link
    In cloud and edge computing models, it is important that compute devices at the edge be as power efficient as possible. Long short-term memory (LSTM) neural networks have been widely used for natural language processing, time series prediction and many other sequential data tasks. Thus, for these applications there is increasing need for low-power accelerators for LSTM model inference at the edge. In order to reduce power dissipation due to data transfers within inference devices, there has been significant interest in accelerating vector-matrix multiplication (VMM) operations using non-volatile memory (NVM) weight arrays. In NVM array-based hardware, reduced bit-widths also significantly increases the power efficiency. In this paper, we focus on the application of quantization-aware training algorithm to LSTM models, and the benefits these models bring in terms of resilience against both quantization error and analog device noise. We have shown that only 4-bit NVM weights and 4-bit ADC/DACs are needed to produce equivalent LSTM network performance as floating-point baseline. Reasonable levels of ADC quantization noise and weight noise can be naturally tolerated within our NVMbased quantized LSTM network. Benchmark analysis of our proposed LSTM accelerator for inference has shown at least 2.4x better computing efficiency and 40x higher area efficiency than traditional digital approaches (GPU, FPGA, and ASIC). Some other novel approaches based on NVM promise to deliver higher computing efficiency (up to 4.7x) but require larger arrays with potential higher error rates.Comment: Published in: 2019 IEEE International Conference on Rebooting Computing (ICRC
    • 

    corecore