17 research outputs found

    Compact and High-Performance TCAM Based on Scaled Double-Gate FeFETs

    Full text link
    Ternary content addressable memory (TCAM), widely used in network routers and high-associativity caches, is gaining popularity in machine learning and data-analytic applications. Ferroelectric FETs (FeFETs) are a promising candidate for implementing TCAM owing to their high ON/OFF ratio, non-volatility, and CMOS compatibility. However, conventional single-gate FeFETs (SG-FeFETs) suffer from relatively high write voltage, low endurance, potential read disturbance, and face scaling challenges. Recently, a double-gate FeFET (DG-FeFET) has been proposed and outperforms SG-FeFETs in many aspects. This paper investigates TCAM design challenges specific to DG-FeFETs and introduces a novel 1.5T1Fe TCAM design based on DG-FeFETs. A 2-step search with early termination is employed to reduce the cell area and improve energy efficiency. A shared driver design is proposed to reduce the peripherals area. Detailed analysis and SPICE simulation show that the 1.5T1Fe DG-TCAM leads to superior search speed and energy efficiency. The 1.5T1Fe TCAM design can also be built with SG-FeFETs, which achieve search latency and energy improvement compared with 2FeFET TCAM.Comment: Accepted by Design Automation Conference (DAC) 202

    Analog Content-Addressable Memory from Complementary FeFETs

    Full text link
    To address the increasing computational demands of artificial intelligence (AI) and big data, compute-in-memory (CIM) integrates memory and processing units into the same physical location, reducing the time and energy overhead of the system. Despite advancements in non-volatile memory (NVM) for matrix multiplication, other critical data-intensive operations, like parallel search, have been overlooked. Current parallel search architectures, namely content-addressable memory (CAM), often use binary, which restricts density and functionality. We present an analog CAM (ACAM) cell, built on two complementary ferroelectric field-effect transistors (FeFETs), that performs parallel search in the analog domain with over 40 distinct match windows. We then deploy it to calculate similarity between vectors, a building block in the following two machine learning problems. ACAM outperforms ternary CAM (TCAM) when applied to similarity search for few-shot learning on the Omniglot dataset, yielding projected simulation results with improved inference accuracy by 5%, 3x denser memory architecture, and more than 100x faster speed compared to central processing unit (CPU) and graphics processing unit (GPU) per similarity search on scaled CMOS nodes. We also demonstrate 1-step inference on a kernel regression model by combining non-linear kernel computation and matrix multiplication in ACAM, with simulation estimates indicating 1,000x faster inference than CPU and GPU

    TCA<i>m</i>M<sup>CogniGron</sup>::Energy Efficient Memristor-Based TCAM for Match-Action Processing

    Get PDF
    The Internet relies heavily on programmable match-action processors for matching network packets against locally available network rules and taking actions, such as forwarding and modification of network packets. This match-action process must be performed at high speed, i.e., commonly within one clock cycle, using a specialized memory unit called Ternary Content Addressable Memory (TCAM). Building on transistor-based CMOS designs, state-of-the-art TCAM architectures have high energy consumption and lack resilient designs for incorporating novel technologies for performing appropriate actions. In this article, we motivate the use of a novel fundamental component, the ‘Memristor’, for the development of TCAM architecture for match-action processing. Memristors can provide energy efficiency, non-volatility and better resource density as compared to transistors. We have proposed a novel memristor-based TCAM architecture called TCAmMCogniGron, built upon the voltage divider principle and requiring only two memristors and five transistors for storage and search operations compared to sixteen transistors in the traditional TCAM architecture. We analyzed its performance over an experimental data set of Nb-doped SrTiO3-based memristor. The analysis of TCAmMCogniGron showed promising power consumption statistics of 16 uW and 1 uW for match and mismatch operations along with twice the improvement in resources density as compared to the traditional architectures

    A Novel Non-Volatile Inverter-based CiM: Continuous Sign Weight Transition and Low Power on-Chip Training

    Full text link
    In this work, we report a novel design, one-transistor-one-inverter (1T1I), to satisfy high speed and low power on-chip training requirements. By leveraging doped HfO2 with ferroelectricity, a non-volatile inverter is successfully demonstrated, enabling desired continuous weight transition between negative and positive via the programmable threshold voltage (VTH) of ferroelectric field-effect transistors (FeFETs). Compared with commonly used designs with the similar function, 1T1I uniquely achieves pure on-chip-based weight transition at an optimized working current without relying on assistance from off-chip calculation units for signed-weight comparison, facilitating high-speed training at low power consumption. Further improvements in linearity and training speed can be obtained via a two-transistor-one-inverter (2T1I) design. Overall, focusing on energy and time efficiencies, this work provides a valuable design strategy for future FeFET-based computing-in-memory (CiM)

    COHERENT/INCOHERENT MAGNETIZATION DYNAMICS OF NANOMAGNETIC DEVICES FOR ULTRA-LOW ENERGY COMPUTING

    Get PDF
    Nanomagnetic computing devices are inherently nonvolatile and show unique transfer characteristics while their switching energy requirements are on par, if not better than state of the art CMOS based devices. These characteristics make them very attractive for both Boolean and non-Boolean computing applications. Among different strategies employed to switch nanomagnetic computing devices e.g. magnetic field, spin transfer torque, spin orbit torque etc., strain induced switching has been shown to be among the most energy efficient. Strain switched nanomagnetic devices are also amenable for non-Boolean computing applications. Such strain mediated magnetization switching, termed here as “Straintronics”, is implemented by switching the magnetization of the magnetic layer of a magnetostrictive-piezoelectric nanoscale heterostructure by applying an electric field in the underlying piezoelectric layer. The modes of “straintronic” switching: coherent vs. incoherent switching of spins can affect device performance such as speed, energy dissipation and switching error in such devices. There was relatively little research performed on understanding the switching mechanism (coherent vs. incoherent) in xiv straintronic devices and their adaptation for non-Boolean computing, both of which have been studied in this thesis. Detailed studies of the effects of nanomagnet geometry and size on the coherence of the switching process and ultimately device performance of such strain switched nanomagnetic devices have been performed. These studies also contributed in optimizing designs for low energy, low dynamic error operation of straintronic logic devices and identified avenues for further research. A Novel non-Boolean “straintronic” computing device (Ternary Content Addressable Memory, abbreviated as TCAM) has been proposed and evaluated through numerical simulations. This device showed significant improvement over existing CMOS device based TCAM implementation in terms of scaling, energy-delay product, operational simplicity etc. The experimental part of this thesis answered a very fundamental question in strain induced magnetization rotation. Specifically, this experiment studied the variation in magnetization orientation for strain induced magnetization rotation along the thickness of a magnetostrictive thin film using polarized neutron reflectometry and demonstrated non-uniform magnetization rotation along the thickness of the sample. Additional experimental work was performed to lay the groundwork for ultra-low voltage straintronic switching demonstration. Preliminary sample fabrication and characterization that can potentially lead to low voltage (~10-100 mV) operation and local clocking of such devices has been performed

    FeFET Based Nonvolatile TCAM and DRAM Development

    Get PDF
    Ferroelectric Field Effect Transistor (FeFET) is a promising nonvolatile device which provides high integration density, fast programming speed, and excellent CMOS compatibility. In general, the non-volatility of FeFET is impacted by its physical structure and there is a trade-off between data retention time and device endurance. To improve the cell endurance, for example, the ferroelectric layer of FeFET needs to be programmed to a low polarization level, leading to a short retention time. In ferroelectric DRAM (FeDRAM) design, degradation in FeFET retention time and write-read disturbance requires the FeDRAM cells to be periodically refreshed in order to prevent data loss. In this work, I propose a novel adaptive refreshing and read voltage control scheme to minimize the energy overheads associated with FeDRAM refreshing while still achieve high cell access reliability. In addition to the DRAM application FeFET based TCAM memory is also studied. TCAM (ternary content addressable memory) is a special memory type that can compare input search data with stored data, and return location (sometime, the associated content) of matched data. TCAM is widely used in microprocessor designs as well as communication chip, e.g., IP-routing. Following technology advances of emerging nonvolatile memories (eNVM), applying eNVM to TCAM designs becomes attractive to achieve high density and low standby power. In this work, I examined the applications of three promising eNVM tech-nologies, i.e., magnetic tunneling junction (MTJ), memristor, and ferroelectric memory field effect transistor (FeMFET), in the design of nonvolatile TCAM cells. All these technologies can achieve close-to-zero standby power though each of them has very different pros and cons
    corecore