23 research outputs found
Compact and High-Performance TCAM Based on Scaled Double-Gate FeFETs
Ternary content addressable memory (TCAM), widely used in network routers and
high-associativity caches, is gaining popularity in machine learning and
data-analytic applications. Ferroelectric FETs (FeFETs) are a promising
candidate for implementing TCAM owing to their high ON/OFF ratio,
non-volatility, and CMOS compatibility. However, conventional single-gate
FeFETs (SG-FeFETs) suffer from relatively high write voltage, low endurance,
potential read disturbance, and face scaling challenges. Recently, a
double-gate FeFET (DG-FeFET) has been proposed and outperforms SG-FeFETs in
many aspects. This paper investigates TCAM design challenges specific to
DG-FeFETs and introduces a novel 1.5T1Fe TCAM design based on DG-FeFETs. A
2-step search with early termination is employed to reduce the cell area and
improve energy efficiency. A shared driver design is proposed to reduce the
peripherals area. Detailed analysis and SPICE simulation show that the 1.5T1Fe
DG-TCAM leads to superior search speed and energy efficiency. The 1.5T1Fe TCAM
design can also be built with SG-FeFETs, which achieve search latency and
energy improvement compared with 2FeFET TCAM.Comment: Accepted by Design Automation Conference (DAC) 202
X-TIME: An in-memory engine for accelerating machine learning on tabular data with CAMs
Structured, or tabular, data is the most common format in data science. While
deep learning models have proven formidable in learning from unstructured data
such as images or speech, they are less accurate than simpler approaches when
learning from tabular data. In contrast, modern tree-based Machine Learning
(ML) models shine in extracting relevant information from structured data. An
essential requirement in data science is to reduce model inference latency in
cases where, for example, models are used in a closed loop with simulation to
accelerate scientific discovery. However, the hardware acceleration community
has mostly focused on deep neural networks and largely ignored other forms of
machine learning. Previous work has described the use of an analog content
addressable memory (CAM) component for efficiently mapping random forests. In
this work, we focus on an overall analog-digital architecture implementing a
novel increased precision analog CAM and a programmable network on chip
allowing the inference of state-of-the-art tree-based ML models, such as
XGBoost and CatBoost. Results evaluated in a single chip at 16nm technology
show 119x lower latency at 9740x higher throughput compared with a
state-of-the-art GPU, with a 19W peak power consumption
TCA<i>m</i>M<sup>CogniGron</sup>::Energy Efficient Memristor-Based TCAM for Match-Action Processing
The Internet relies heavily on programmable match-action processors for matching network packets against locally available network rules and taking actions, such as forwarding and modification of network packets. This match-action process must be performed at high speed, i.e., commonly within one clock cycle, using a specialized memory unit called Ternary Content Addressable Memory (TCAM). Building on transistor-based CMOS designs, state-of-the-art TCAM architectures have high energy consumption and lack resilient designs for incorporating novel technologies for performing appropriate actions. In this article, we motivate the use of a novel fundamental component, the ‘Memristor’, for the development of TCAM architecture for match-action processing. Memristors can provide energy efficiency, non-volatility and better resource density as compared to transistors. We have proposed a novel memristor-based TCAM architecture called TCAmMCogniGron, built upon the voltage divider principle and requiring only two memristors and five transistors for storage and search operations compared to sixteen transistors in the traditional TCAM architecture. We analyzed its performance over an experimental data set of Nb-doped SrTiO3-based memristor. The analysis of TCAmMCogniGron showed promising power consumption statistics of 16 uW and 1 uW for match and mismatch operations along with twice the improvement in resources density as compared to the traditional architectures
Towards Energy Efficient Memristor-based TCAM for Match-Action Processing
Match-action processors play a crucial role of communicating end-users in the Internet by computing network paths and enforcing administrator policies. The computation process uses a specialized memory called Ternary Content Addressable Memory (TCAM) to store processing rules and use header information of network packets to perform a match within a single clock cycle. Currently, TCAM memories consume huge amounts of energy resources due to the use of traditional transistor-based CMOS technology. In this article, we motivate the use of a novel component, the memristor, for the development of a TCAM architecture. Memristors can provide energy efficiency, non-volatility, and better resource density as compared to transistors. We have proposed a novel memristor-based TCAM architecture built upon the voltage divider principle for energy efficient match-action processing. Moreover, we have tested the performance of the memristor-based TCAM architecture using the experimental data of a novel Nb-doped SrTiO3 memristor. Energy analysis of the proposed TCAM architecture for given memristor shows promising power consumption statistics of 16 μW for a match operation and 1 μW for a mismatch operation
In-memory computing with emerging memory devices: Status and outlook
Supporting data for "In-memory computing with emerging memory devices: status and outlook", submitted to APL Machine Learning
Memristor MOS Content Addressable Memory (MCAM): Hybrid Architecture for Future High Performance Search Engines
Large-capacity Content Addressable Memory (CAM) is a key element in a wide
variety of applications. The inevitable complexities of scaling MOS transistors
introduce a major challenge in the realization of such systems. Convergence of
disparate technologies, which are compatible with CMOS processing, may allow
extension of Moore's Law for a few more years. This paper provides a new
approach towards the design and modeling of Memristor (Memory resistor) based
Content Addressable Memory (MCAM) using a combination of memristor MOS devices
to form the core of a memory/compare logic cell that forms the building block
of the CAM architecture. The non-volatile characteristic and the nanoscale
geometry together with compatibility of the memristor with CMOS processing
technology increases the packing density, provides for new approaches towards
power management through disabling CAM blocks without loss of stored data,
reduces power dissipation, and has scope for speed improvement as the
technology matures.Comment: 10 pages, 11 figure