10 research outputs found
Analog Content-Addressable Memory from Complementary FeFETs
To address the increasing computational demands of artificial intelligence
(AI) and big data, compute-in-memory (CIM) integrates memory and processing
units into the same physical location, reducing the time and energy overhead of
the system. Despite advancements in non-volatile memory (NVM) for matrix
multiplication, other critical data-intensive operations, like parallel search,
have been overlooked. Current parallel search architectures, namely
content-addressable memory (CAM), often use binary, which restricts density and
functionality. We present an analog CAM (ACAM) cell, built on two complementary
ferroelectric field-effect transistors (FeFETs), that performs parallel search
in the analog domain with over 40 distinct match windows. We then deploy it to
calculate similarity between vectors, a building block in the following two
machine learning problems. ACAM outperforms ternary CAM (TCAM) when applied to
similarity search for few-shot learning on the Omniglot dataset, yielding
projected simulation results with improved inference accuracy by 5%, 3x denser
memory architecture, and more than 100x faster speed compared to central
processing unit (CPU) and graphics processing unit (GPU) per similarity search
on scaled CMOS nodes. We also demonstrate 1-step inference on a kernel
regression model by combining non-linear kernel computation and matrix
multiplication in ACAM, with simulation estimates indicating 1,000x faster
inference than CPU and GPU
A Novel Non-Volatile Inverter-based CiM: Continuous Sign Weight Transition and Low Power on-Chip Training
In this work, we report a novel design, one-transistor-one-inverter (1T1I),
to satisfy high speed and low power on-chip training requirements. By
leveraging doped HfO2 with ferroelectricity, a non-volatile inverter is
successfully demonstrated, enabling desired continuous weight transition
between negative and positive via the programmable threshold voltage (VTH) of
ferroelectric field-effect transistors (FeFETs). Compared with commonly used
designs with the similar function, 1T1I uniquely achieves pure on-chip-based
weight transition at an optimized working current without relying on assistance
from off-chip calculation units for signed-weight comparison, facilitating
high-speed training at low power consumption. Further improvements in linearity
and training speed can be obtained via a two-transistor-one-inverter (2T1I)
design. Overall, focusing on energy and time efficiencies, this work provides a
valuable design strategy for future FeFET-based computing-in-memory (CiM)
In-memory computing with emerging memory devices: Status and outlook
Supporting data for "In-memory computing with emerging memory devices: status and outlook", submitted to APL Machine Learning
Axp: A hw-sw co-design pipeline for energy-efficient approximated convnets via associative matching
The reduction in energy consumption is key for deep neural networks (DNNs) to ensure usability and reliability, whether they are deployed on low-power end-nodes with limited resources or high-performance platforms that serve large pools of users. Leveraging the over-parametrization shown by many DNN models, convolutional neural networks (ConvNets) in particular, energy efficiency can be improved substantially preserving the model accuracy. The solution proposed in this work exploits the intrinsic redundancy of ConvNets to maximize the reuse of partial arithmetic results during the inference stages. Specifically, the weight-set of a given ConvNet is discretized through a clustering procedure such that the largest possible number of inner multiplications fall into predefined bins; this allows an off-line computation of the most frequent results, which in turn can be stored locally and retrieved when needed during the forward pass. Such a reuse mechanism leads to remarkable energy savings with the aid of a custom processing element (PE) that integrates an associative memory with a standard floating-point unit (FPU). Moreover, the adoption of an approximate associative rule based on a partial bit-match increases the hit rate over the pre-computed results, maximizing the energy reduction even further. Results collected on a set of ConvNets trained for computer vision and speech processing tasks reveal that the proposed associative-based hw-sw co-design achieves up to 77% in energy savings with less than 1% in accuracy loss
GenPIP: In-Memory Acceleration of Genome Analysis via Tight Integration of Basecalling and Read Mapping
Nanopore sequencing is a widely-used high-throughput genome sequencing
technology that can sequence long fragments of a genome into raw electrical
signals at low cost. Nanopore sequencing requires two computationally-costly
processing steps for accurate downstream genome analysis. The first step,
basecalling, translates the raw electrical signals into nucleotide bases (i.e.,
A, C, G, T). The second step, read mapping, finds the correct location of a
read in a reference genome. In existing genome analysis pipelines, basecalling
and read mapping are executed separately. We observe in this work that such
separate execution of the two most time-consuming steps inherently leads to (1)
significant data movement and (2) redundant computations on the data, slowing
down the genome analysis pipeline. This paper proposes GenPIP, an in-memory
genome analysis accelerator that tightly integrates basecalling and read
mapping. GenPIP improves the performance of the genome analysis pipeline with
two key mechanisms: (1) in-memory fine-grained collaborative execution of the
major genome analysis steps in parallel; (2) a new technique for
early-rejection of low-quality and unmapped reads to timely stop the execution
of genome analysis for such reads, reducing inefficient computation. Our
experiments show that, for the execution of the genome analysis pipeline,
GenPIP provides 41.6X (8.4X) speedup and 32.8X (20.8X) energy savings with
negligible accuracy loss compared to the state-of-the-art software genome
analysis tools executed on a state-of-the-art CPU (GPU). Compared to a design
that combines state-of-the-art in-memory basecalling and read mapping
accelerators, GenPIP provides 1.39X speedup and 1.37X energy savings.Comment: 17 pages, 13 figure
High-Density Solid-State Memory Devices and Technologies
This Special Issue aims to examine high-density solid-state memory devices and technologies from various standpoints in an attempt to foster their continuous success in the future. Considering that broadening of the range of applications will likely offer different types of solid-state memories their chance in the spotlight, the Special Issue is not focused on a specific storage solution but rather embraces all the most relevant solid-state memory devices and technologies currently on stage. Even the subjects dealt with in this Special Issue are widespread, ranging from process and design issues/innovations to the experimental and theoretical analysis of the operation and from the performance and reliability of memory devices and arrays to the exploitation of solid-state memories to pursue new computing paradigms