15 research outputs found
Content Addressable Memories and Transformable Logic Circuits Based on Ferroelectric Reconfigurable Transistors for In-Memory Computing
As a promising alternative to the Von Neumann architecture, in-memory
computing holds the promise of delivering high computing capacity while
consuming low power. Content addressable memory (CAM) can implement pattern
matching and distance measurement in memory with massive parallelism, making
them highly desirable for data-intensive applications. In this paper, we
propose and demonstrate a novel 1-transistor-per-bit CAM based on the
ferroelectric reconfigurable transistor. By exploiting the switchable polarity
of the ferroelectric reconfigurable transistor, XOR/XNOR-like matching
operation in CAM can be realized in a single transistor. By eliminating the
need for the complementary circuit, these non-volatile CAMs based on
reconfigurable transistors can offer a significant improvement in area and
energy efficiency compared to conventional CAMs. NAND- and NOR-arrays of CAMs
are also demonstrated, which enable multi-bit matching in a single reading
operation. In addition, the NOR array of CAM cells effectively measures the
Hamming distance between the input query and stored entries. Furthermore,
utilizing the switchable polarity of these ferroelectric Schottky barrier
transistors, we demonstrate reconfigurable logic gates with NAND/NOR dual
functions, whose input-output mapping can be transformed in real-time without
changing the layout. These reconfigurable circuits will serve as important
building blocks for high-density data-stream processors and reconfigurable
Application-Specific Integrated Circuits (r-ASICs). The CAMs and transformable
logic gates based on ferroelectric reconfigurable transistors will have broad
applications in data-intensive applications from image processing to machine
learning and artificial intelligence
In-memory computing with emerging memory devices: Status and outlook
Supporting data for "In-memory computing with emerging memory devices: status and outlook", submitted to APL Machine Learning
A scalable multi-core architecture with heterogeneous memory structures for Dynamic Neuromorphic Asynchronous Processors (DYNAPs)
Neuromorphic computing systems comprise networks of neurons that use
asynchronous events for both computation and communication. This type of
representation offers several advantages in terms of bandwidth and power
consumption in neuromorphic electronic systems. However, managing the traffic
of asynchronous events in large scale systems is a daunting task, both in terms
of circuit complexity and memory requirements. Here we present a novel routing
methodology that employs both hierarchical and mesh routing strategies and
combines heterogeneous memory structures for minimizing both memory
requirements and latency, while maximizing programming flexibility to support a
wide range of event-based neural network architectures, through parameter
configuration. We validated the proposed scheme in a prototype multi-core
neuromorphic processor chip that employs hybrid analog/digital circuits for
emulating synapse and neuron dynamics together with asynchronous digital
circuits for managing the address-event traffic. We present a theoretical
analysis of the proposed connectivity scheme, describe the methods and circuits
used to implement such scheme, and characterize the prototype chip. Finally, we
demonstrate the use of the neuromorphic processor with a convolutional neural
network for the real-time classification of visual symbols being flashed to a
dynamic vision sensor (DVS) at high speed.Comment: 17 pages, 14 figure
Long-Term Memory for Cognitive Architectures: A Hardware Approach Using Resistive Devices
A cognitive agent capable of reliably performing complex tasks over a long time will acquire a large store of knowledge. To interact with changing circumstances, the agent will need to quickly search and retrieve knowledge relevant to its current context. Real time knowledge search and cognitive processing like this is a challenge for conventional computers, which are not optimised for such tasks. This thesis describes a new content-addressable memory, based on resistive devices, that can perform massively parallel knowledge search in the memory array. The fundamental circuit block that supports this capability is a memory cell that closely couples comparison logic with non-volatile storage. By using resistive devices instead of transistors in both the comparison circuit and storage elements, this cell improves area density by over an order of magnitude compared to state of the art CMOS implementations. The resulting memory does not need power to maintain stored information, and is therefore well suited to cognitive agents with large long-term memories. The memory incorporates activation circuits, which bias the knowledge retrieval process according to past memory access patterns. This is achieved by approximating the widely used base-level activation function using resistive devices to store, maintain and compare activation values. By distributing an instance of this circuit to every row in memory, the activation for all memory objects can be updated in parallel. A test using the word sense disambiguation task shows this circuit-based activation model only incurs a small loss in accuracy compared to exact base-level calculations. A variation of spreading activation can also be achieved in-memory. Memory objects are encoded with high-dimensional vectors that create association between correlated representations. By storing these high-dimensional vectors in the new content-addressable memory, activation can be spread to related objects during search operations. The new memory is scalable, power and area efficient, and performs operations in parallel that are infeasible in real-time for a sequential processor with a conventional memory hierarchy.Thesis (Ph.D.) -- University of Adelaide, School of Electrical and Electronic Engineering, 201
Adaptive extreme edge computing for wearable devices
Wearable devices are a fast-growing technology with impact on personal healthcare for both society and economy. Due to the widespread of sensors in pervasive and distributed networks, power consumption, processing speed, and system adaptation are vital in future smart wearable devices. The visioning and forecasting of how to bring computation to the edge in smart sensors have already begun, with an aspiration to provide adaptive extreme edge computing. Here, we provide a holistic view of hardware and theoretical solutions towards smart wearable devices that can provide guidance to research in this pervasive computing era. We propose various solutions for biologically plausible models for continual learning in neuromorphic computing technologies for wearable sensors. To envision this concept, we provide a systematic outline in which prospective low power and low latency scenarios of wearable sensors in neuromorphic platforms are expected. We successively describe vital potential landscapes of neuromorphic processors exploiting complementary metal-oxide semiconductors (CMOS) and emerging memory technologies (e.g. memristive devices). Furthermore, we evaluate the requirements for edge computing within wearable devices in terms of footprint, power consumption, latency, and data size. We additionally investigate the challenges beyond neuromorphic computing hardware, algorithms and devices that could impede enhancement of adaptive edge computing in smart wearable devices
Analog Content-Addressable Memory from Complementary FeFETs
To address the increasing computational demands of artificial intelligence
(AI) and big data, compute-in-memory (CIM) integrates memory and processing
units into the same physical location, reducing the time and energy overhead of
the system. Despite advancements in non-volatile memory (NVM) for matrix
multiplication, other critical data-intensive operations, like parallel search,
have been overlooked. Current parallel search architectures, namely
content-addressable memory (CAM), often use binary, which restricts density and
functionality. We present an analog CAM (ACAM) cell, built on two complementary
ferroelectric field-effect transistors (FeFETs), that performs parallel search
in the analog domain with over 40 distinct match windows. We then deploy it to
calculate similarity between vectors, a building block in the following two
machine learning problems. ACAM outperforms ternary CAM (TCAM) when applied to
similarity search for few-shot learning on the Omniglot dataset, yielding
projected simulation results with improved inference accuracy by 5%, 3x denser
memory architecture, and more than 100x faster speed compared to central
processing unit (CPU) and graphics processing unit (GPU) per similarity search
on scaled CMOS nodes. We also demonstrate 1-step inference on a kernel
regression model by combining non-linear kernel computation and matrix
multiplication in ACAM, with simulation estimates indicating 1,000x faster
inference than CPU and GPU
High-Density Solid-State Memory Devices and Technologies
This Special Issue aims to examine high-density solid-state memory devices and technologies from various standpoints in an attempt to foster their continuous success in the future. Considering that broadening of the range of applications will likely offer different types of solid-state memories their chance in the spotlight, the Special Issue is not focused on a specific storage solution but rather embraces all the most relevant solid-state memory devices and technologies currently on stage. Even the subjects dealt with in this Special Issue are widespread, ranging from process and design issues/innovations to the experimental and theoretical analysis of the operation and from the performance and reliability of memory devices and arrays to the exploitation of solid-state memories to pursue new computing paradigms
AI/ML Algorithms and Applications in VLSI Design and Technology
An evident challenge ahead for the integrated circuit (IC) industry in the
nanometer regime is the investigation and development of methods that can
reduce the design complexity ensuing from growing process variations and
curtail the turnaround time of chip manufacturing. Conventional methodologies
employed for such tasks are largely manual; thus, time-consuming and
resource-intensive. In contrast, the unique learning strategies of artificial
intelligence (AI) provide numerous exciting automated approaches for handling
complex and data-intensive tasks in very-large-scale integration (VLSI) design
and testing. Employing AI and machine learning (ML) algorithms in VLSI design
and manufacturing reduces the time and effort for understanding and processing
the data within and across different abstraction levels via automated learning
algorithms. It, in turn, improves the IC yield and reduces the manufacturing
turnaround time. This paper thoroughly reviews the AI/ML automated approaches
introduced in the past towards VLSI design and manufacturing. Moreover, we
discuss the scope of AI/ML applications in the future at various abstraction
levels to revolutionize the field of VLSI design, aiming for high-speed, highly
intelligent, and efficient implementations
MOVING OBJECT DETECTION WITH MEMRISTIVE CROSSBAR ARRAYS
This thesis is dedicated to the hardware implementation of a novel moving
object detection algorithm. Proposed circuit includes several stages, each of
which implements a particular step of the algorithm. Four higher bit planes are
extracted from a grayscale image and stored in memristive crossbar arrays, and
the respective bit planes are compared via memristive threshold logic gates in
XOR configuration. In the next stage, compared bit planes are combined by
weighted summation, with a highest weight assigned to MSB plane and smaller
weights for less significant bit planes. After summation stage, obtained grayscale
image is thresholded to obtain binary image. The last stage is implemented via
memristive content-addressable memory array, which serves two purposes. It
is used as a long-term memory in comparison to crossbar arrays, which serve
as a short-term memory of proposed circuit. Content-addressable memory
is updated based on the row-by-row difference between first and second pair
of frames processed by previous stages. It also allows for analysis of object
movement direction and velocity by observing the row capacitors’ discharge.
Simulations show that accuracy of proposed circuit operation is increased with
the larger array size. Delay analysis of the circuit is carried out, power and area
calculations show that proposed circuit is a viable candidate as a co-processing
operator for existing image sensors