19 research outputs found

    A Ultra-Low-Power FPGA Based on Monolithically Integrated RRAMs (invited)

    Get PDF
    Field Programmable Gate Arrays (FPGAs) rely heavily on complex routing architectures. The routing structures use programmable switches and account for a significant share in the total area, delay and power consumption numbers. With the ability of being monolithically integrated with CMOS chips, Resistive Random Access Memories (RRAMs) enable high-performance routing architectures through the replacement of Static Random Access Memory (SRAM)-based programming switches. Exploiting the very low on-resistance state achievable by RRAMs as well as the improved tolerance to power supply reduction, RRAM-based routing multiplexers can be used to significantly reduce the power consumption of FPGA systems with no performance compromises. By evaluating the opportunities of ultra-low-power RRAM-based FPGAs at the system level, we see an improvement of 12%, 26% and 81% in area, delay and power consumption at a mature technology node

    Neuromorphic object localization using resistive memories and ultrasonic transducers

    Full text link
    Real-world sensory-processing applications require compact, low-latency, and low-power computing systems. Enabled by their in-memory event-driven computing abilities, hybrid memristive-Complementary Metal-Oxide Semiconductor neuromorphic architectures provide an ideal hardware substrate for such tasks. To demonstrate the full potential of such systems, we propose and experimentally demonstrate an end-to-end sensory processing solution for a real-world object localization application. Drawing inspiration from the barn owl’s neuroanatomy, we developed a bio-inspired, event-driven object localization system that couples state-of-the-art piezoelectric micromachined ultrasound transducer sensors to a neuromorphic resistive memories-based computational map. We present measurement results from the fabricated system comprising resistive memories-based coincidence detectors, delay line circuits, and a full-custom ultrasound sensor. We use these experimental results to calibrate our system-level simulations. These simulations are then used to estimate the angular resolution and energy efficiency of the object localization model. The results reveal the potential of our approach, evaluated in orders of magnitude greater energy efficiency than a microcontroller performing the same task

    A Study on Buffer Distribution for RRAM-based FPGA Routing Structures

    Get PDF
    Compared to Application-Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) provide reconfigurablity at the cost of lower performance and higher power consumption. Exploiting a large number of programmable switches, routing structures are mainly responsible for the performance limitation. Hence, employing more efficient switches can drastically improve the performance and reduce the power consumption of the FPGA. Resistive Random Access Memory (RRAM)-based switches are one of the most promising candidates to improve the FPGA routing architecture thanks to their low on-resistance and non-volatility. The lower RC delay of RRAM-based routing multiplexers, as compared to CMOS-based routing structures encourages us to reconsider the buffer distribution in FPGAs. This paper proposes an approach to reduce the number of buffers in the routing path of RRAM-based FPGAs. Our architectural simulations show that the use of RRAM switches improves the critical path delay by 56% as compared to CMOS switches in standard FPGA circuits at 45-nm technology node while, at the same time, the area and power are reduced, respectively, by 17% and 9%. By adapting the buffering scheme, an extra bonus of 9% for delay reduction, 5% for power reduction and 16% for area reduction can be obtained, as compared to the conventional buffering approach for RRAM-based FPGAs

    Fast Logic Synthesis for RRAM-based In-Memory Computing using Majority-Inverter Graphs

    Full text link

    Towards Efficient In-memory Computing Hardware for Quantized Neural Networks: State-of-the-art, Open Challenges and Perspectives

    Full text link
    The amount of data processed in the cloud, the development of Internet-of-Things (IoT) applications, and growing data privacy concerns force the transition from cloud-based to edge-based processing. Limited energy and computational resources on edge push the transition from traditional von Neumann architectures to In-memory Computing (IMC), especially for machine learning and neural network applications. Network compression techniques are applied to implement a neural network on limited hardware resources. Quantization is one of the most efficient network compression techniques allowing to reduce the memory footprint, latency, and energy consumption. This paper provides a comprehensive review of IMC-based Quantized Neural Networks (QNN) and links software-based quantization approaches to IMC hardware implementation. Moreover, open challenges, QNN design requirements, recommendations, and perspectives along with an IMC-based QNN hardware roadmap are provided

    Circuit Design, Architecture and CAD for RRAM-based FPGAs

    Get PDF
    Field Programmable Gate Arrays (FPGAs) have been indispensable components of embedded systems and datacenter infrastructures. However, energy efficiency of FPGAs has become a hard barrier preventing their expansion to more application contexts, due to two physical limitations: (1) The massive usage of routing multiplexers causes delay and power overheads as compared to ASICs. To reduce their power consumption, FPGAs have to operate at low supply voltage but sacrifice performance because the transistors drive degrade when working voltage decreases. (2) Using volatile memory technology forces FPGAs to lose configurations when powered off and to be reconfigured at each power on. Resistive Random Access Memories (RRAMs) have strong potentials in overcoming the physical limitations of conventional FPGAs. First of all, RRAMs grant FPGAs non-volatility, enabling FPGAs to be "Normally powered off, Instantly powered on". Second, by combining functionality of memory and pass-gate logic in one unique device, RRAMs can greatly reduce area and delay of routing elements. Third, when RRAMs are embedded into datpaths, the performance of circuits can be independent from their working voltage, beyond the limitations of CMOS circuits. However, researches and development of RRAM-based FPGAs are in their infancy. Most of area and performance predictions were achieved without solid circuit-level simulations and sophisticated Computer Aided Design (CAD) tools, causing the predicted improvements to be less convincing. In this thesis,we present high-performance and low-power RRAM-based FPGAs fromtransistorlevel circuit designs to architecture-level optimizations and CAD tools, using theoretical analysis, industrial electrical simulators and novel CAD tools. We believe that this is the first systematic study in the field, covering: From a circuit design perspective, we propose efficient RRAM-based programming circuits and routing multiplexers through both theoretical analysis and electrical simulations. The proposed 4T(ransitor)1R(RAM) programming structure demonstrates significant improvements in programming current, when compared to most popular 2T1R programming structure. 4T1R-based routingmultiplexer designs are proposed by considering various physical design parasitics, such as intrinsic capacitance of RRAMs and wells doping organization. The proposed 4T1R-based multiplexers outperformbest CMOS implementations significantly in area, delay and power at both nominal and near-Vt regime. From a CAD perspective, we develop a generic FPGA architecture exploration tool, FPGASPICE, modeling a full FPGA fabric with SPICE and Verilog netlists. FPGA-SPICE provides different levels of testbenches and techniques to split large SPICE netlists, in order to obtain better trade-off between simulation time and accuracy. FPGA-SPICE can capture area and power characteristics of SRAM-based and RRAM-based FPGAs more accurately than the currently best analyticalmodels. From an architecture perspective, we propose architecture-level optimizations for RRAMbased FPGAs and quantify their minimumrequirements for RRAM devices. Compared to the best SRAM-based FPGAs, an optimized RRAM-based FPGA architecture brings significant reduction in area, delay and power respectively. In particular, RRAM-based FPGAs operating in the near-Vt regime demonstrate a 5x power improvement without delay overhead as compared to optimized SRAM-based FPGA operating at nominal working voltage

    Signaling in 3-D integrated circuits, benefits and challenges

    Get PDF
    Three-dimensional (3-D) or vertical integration is a design and packaging paradigm that can mitigate many of the increasing challenges related to the design of modern integrated systems. 3-D circuits have recently been at the spotlight, since these circuits provide a potent approach to enhance the performance and integrate diverse functions within amulti-plane stack. Clock networks consume a great portion of the power dissipated in a circuit. Therefore, designing a low-power clock network in synchronous circuits is an important task. This requirement is stricter for 3-D circuits due to the increased power densities. Synchronization issues can be more challenging for 3-D circuits since a clock path can spread across several planes with different physical and electrical characteristics. Consequently, designing low power clock networks for 3-D circuits is an important issue. Resonant clock networks are considered efficient low-power alternatives to conventional clock distribution schemes. These networks utilize additional inductive circuits to reduce power while delivering a full swing clock signal to the sink nodes. In this research, a design method to apply resonant clocking to synthesized clock trees is proposed. Manufacturing processes for 3-D circuits include some additional steps as compared to standard CMOS processes which makes 3-D circuits more susceptible to manufacturing defects and lowers the overall yield of the bonded 3-D stack. Testing is another complicated task for 3-D ICs, where pre-bond test is a prerequisite. Pre-bond testability, in turn, presents new challenges to 3-D clock network design primarily due to the incomplete clock distribution networks prior to the bonding of the planes. A design methodology of resonant 3-D clock networks that support wireless pre-bond testing is introduced. To efficiently address this issue, inductive links are exploited to wirelessly transmit the clock signal to the disjoint resonant clock networks. The inductors comprising the LC tanks are used as the receiver circuit for the links, essentially eliminating the need for additional circuits and/or interconnect resources during pre-bond test. Recent FPGAs are quite complex circuits which provide reconfigurablity at the cost of lower performance and higher power consumption as compared to ASIC circuits. Exploiting a large number of programmable switches, routing structures are mainly responsible for performance degradation in FPAGs. Employing 3-D technology can providemore efficient switches which drastically improve the performance and reduce the power consumption of the FPGA. RRAM switches are one of the most promising candidates to improve the FPGA routing architecture thanks to their low on-resistance and non-volatility. Along with the configurable switches, buffers are the other important element of the FPGAs routing structure. Different characteristics of RRAM switches change the properties of signal paths in RRAM-based FPGAs. The on resistance of RRAMswitches is considerably lower than CMOS pass gate switches which results in lower RC delay for RRAM-based routing paths. This different nature in critical path and signal delay in turn affect the need for intermediate buffers. Thus the buffer allocation should be reconsidered. In the last part of this research, the effect of intermediate buffers on signal propagation delay is studied and a modified buffer allocation scheme for RRAM-based FPGA routing path is proposed

    Designing energy-efficient sub-threshold logic circuits using equalization and non-volatile memory circuits using memristors

    Full text link
    The very large scale integration (VLSI) community has utilized aggressive complementary metal-oxide semiconductor (CMOS) technology scaling to meet the ever-increasing performance requirements of computing systems. However, as we enter the nanoscale regime, the prevalent process variation effects degrade the CMOS device reliability. Hence, it is increasingly essential to explore emerging technologies which are compatible with the conventional CMOS process for designing highly-dense memory/logic circuits. Memristor technology is being explored as a potential candidate in designing non-volatile memory arrays and logic circuits with high density, low latency and small energy consumption. In this thesis, we present the detailed functionality of multi-bit 1-Transistor 1-memRistor (1T1R) cell-based memory arrays. We present the performance and energy models for an individual 1T1R memory cell and the memory array as a whole. We have considered TiO2- and HfOx-based memristors, and for these technologies there is a sub-10% difference between energy and performance computed using our models and HSPICE simulations. Using a performance-driven design approach, the energy-optimized TiO2-based RRAM array consumes the least write energy (4.06 pJ/bit) and read energy (188 fJ/bit) when storing 3 bits/cell for 100 nsec write and 1 nsec read access times. Similarly, HfOx-based RRAM array consumes the least write energy (365 fJ/bit) and read energy (173 fJ/bit) when storing 3 bits/cell for 1 nsec write and 200 nsec read access times. On the logic side, we investigate the use of equalization techniques to improve the energy efficiency of digital sequential logic circuits in sub-threshold regime. We first propose the use of a variable threshold feedback equalizer circuit with combinational logic blocks to mitigate the timing errors in digital logic designed in sub-threshold regime. This mitigation of timing errors can be leveraged to reduce the dominant leakage energy by scaling supply voltage or decreasing the propagation delay. At the fixed supply voltage, we can decrease the propagation delay of the critical path in a combinational logic block using equalizer circuits and, correspondingly decrease the leakage energy consumption. For a 8-bit carry lookahead adder designed in UMC 130 nm process, the operating frequency can be increased by 22.87% (on average), while reducing the leakage energy by 22.6% (on average) in the sub-threshold regime. Overall, the feedback equalization technique provides up to 35.4% lower energy-delay product compared to the conventional non-equalized logic. We also propose a tunable adaptive feedback equalizer circuit that can be used with sequential digital logic to mitigate the process variation effects and reduce the dominant leakage energy component in sub-threshold digital logic circuits. For a 64-bit adder designed in 130 nm our proposed approach can reduce the normalized delay variation of the critical path delay from 16.1% to 11.4% while reducing the energy-delay product by 25.83% at minimum energy supply voltage. In addition, we present detailed energy-performance models of the adaptive feedback equalizer circuit. This work serves as a foundation for the design of robust, energy-efficient digital logic circuits in sub-threshold regime

    Virtualizing Reconfigurable Architectures: From Fpgas To Beyond

    Get PDF
    With field-programmable gate arrays (FPGAs) being widely deployed in data centers to enhance the computing performance, an efficient virtualization support is required to fully unleash the potential of cloud FPGAs. However, the system support for FPGAs in the context of the cloud environment is still in its infancy, which leads to a low resource utilization due to the tight coupling between compilation and resource allocation. Moreover, the system support proposed in existing works is limited to a homogeneous FPGA cluster comprising identical FPGA devices, which is hard to be extended to a heterogeneous FPGA cluster that comprises multiple types of FPGAs. As the FPGA cloud is expected to become increasingly heterogeneous due to the hardware rolling upgrade strategy, it is necessary to provide efficient virtualization support for the heterogeneous FPGA cluster. In this dissertation, we first identify three pairs of conflicting requirements from runtime management and offline compilation, which are related to the tradeoff between flexibility and efficiency. These conflicting requirements are the fundamental reason why the single-level abstraction proposed in prior works for the homogeneous FPGA cluster cannot be trivially extended to the heterogeneous cluster. To decouple these conflicting requirements, we provide a two-level system abstraction. Specifically, the high-level abstraction is FPGA-agnostic and provides a simple and homogeneous view of the FPGA resources to simplify the runtime management and maximize the flexibility. On the contrary, the low-level abstraction is FPGA-specific and exposes sufficient low-level hardware details to the compilation framework to ensure the mapping quality and maximize the efficiency. This generic two-level system abstraction can also be specialized to the homogeneous FPGA cluster and/or be extended to leverage application-specific information to further improve the efficiency. We also develop a compilation framework and a modular runtime system with a heuristic-based runtime management policy to support this two-level system abstraction. By enabling a dynamic FPGA sharing at the sub-FPGA granularity, the proposed virtualization solution can deploy 1.62x more applications using the same amount of FPGA resources and reduce the compilation time by 22.6% (perform as many compilation tasks in parallel as possible) with an acceptable virtualization overhead, i.e., Finally, we use Liquid Silicon as a case study to show that the proposed virtualization solution can be extended to other spatial reconfigurable architectures. Liquid Silicon is a homogeneous reconfigurable architecture enabled by the non-volatile memory technology (i.e., RRAM). It extends the configuration capability of existing FPGAs from computation to the whole spectrum ranging from computation to data storage. It allows users to better customize hardware by flexibly partitioning hardware resources between computation and memory based on the actual usage. Instead of naively applying the proposed virtualization solution onto Liquid Silicon, we co-optimize the system abstraction and Liquid Silicon architecture to improve the performance

    LOW POWER CIRCUITS DESIGN USING RESISTIVE NON-VOLATILE MEMORIES

    Get PDF
    Ph.DDOCTOR OF PHILOSOPH
    corecore