15 research outputs found

    Content Addressable Memories and Transformable Logic Circuits Based on Ferroelectric Reconfigurable Transistors for In-Memory Computing

    Full text link
    As a promising alternative to the Von Neumann architecture, in-memory computing holds the promise of delivering high computing capacity while consuming low power. Content addressable memory (CAM) can implement pattern matching and distance measurement in memory with massive parallelism, making them highly desirable for data-intensive applications. In this paper, we propose and demonstrate a novel 1-transistor-per-bit CAM based on the ferroelectric reconfigurable transistor. By exploiting the switchable polarity of the ferroelectric reconfigurable transistor, XOR/XNOR-like matching operation in CAM can be realized in a single transistor. By eliminating the need for the complementary circuit, these non-volatile CAMs based on reconfigurable transistors can offer a significant improvement in area and energy efficiency compared to conventional CAMs. NAND- and NOR-arrays of CAMs are also demonstrated, which enable multi-bit matching in a single reading operation. In addition, the NOR array of CAM cells effectively measures the Hamming distance between the input query and stored entries. Furthermore, utilizing the switchable polarity of these ferroelectric Schottky barrier transistors, we demonstrate reconfigurable logic gates with NAND/NOR dual functions, whose input-output mapping can be transformed in real-time without changing the layout. These reconfigurable circuits will serve as important building blocks for high-density data-stream processors and reconfigurable Application-Specific Integrated Circuits (r-ASICs). The CAMs and transformable logic gates based on ferroelectric reconfigurable transistors will have broad applications in data-intensive applications from image processing to machine learning and artificial intelligence

    In-memory computing with emerging memory devices: Status and outlook

    Get PDF
    Supporting data for "In-memory computing with emerging memory devices: status and outlook", submitted to APL Machine Learning

    A scalable multi-core architecture with heterogeneous memory structures for Dynamic Neuromorphic Asynchronous Processors (DYNAPs)

    Full text link
    Neuromorphic computing systems comprise networks of neurons that use asynchronous events for both computation and communication. This type of representation offers several advantages in terms of bandwidth and power consumption in neuromorphic electronic systems. However, managing the traffic of asynchronous events in large scale systems is a daunting task, both in terms of circuit complexity and memory requirements. Here we present a novel routing methodology that employs both hierarchical and mesh routing strategies and combines heterogeneous memory structures for minimizing both memory requirements and latency, while maximizing programming flexibility to support a wide range of event-based neural network architectures, through parameter configuration. We validated the proposed scheme in a prototype multi-core neuromorphic processor chip that employs hybrid analog/digital circuits for emulating synapse and neuron dynamics together with asynchronous digital circuits for managing the address-event traffic. We present a theoretical analysis of the proposed connectivity scheme, describe the methods and circuits used to implement such scheme, and characterize the prototype chip. Finally, we demonstrate the use of the neuromorphic processor with a convolutional neural network for the real-time classification of visual symbols being flashed to a dynamic vision sensor (DVS) at high speed.Comment: 17 pages, 14 figure

    Long-Term Memory for Cognitive Architectures: A Hardware Approach Using Resistive Devices

    Get PDF
    A cognitive agent capable of reliably performing complex tasks over a long time will acquire a large store of knowledge. To interact with changing circumstances, the agent will need to quickly search and retrieve knowledge relevant to its current context. Real time knowledge search and cognitive processing like this is a challenge for conventional computers, which are not optimised for such tasks. This thesis describes a new content-addressable memory, based on resistive devices, that can perform massively parallel knowledge search in the memory array. The fundamental circuit block that supports this capability is a memory cell that closely couples comparison logic with non-volatile storage. By using resistive devices instead of transistors in both the comparison circuit and storage elements, this cell improves area density by over an order of magnitude compared to state of the art CMOS implementations. The resulting memory does not need power to maintain stored information, and is therefore well suited to cognitive agents with large long-term memories. The memory incorporates activation circuits, which bias the knowledge retrieval process according to past memory access patterns. This is achieved by approximating the widely used base-level activation function using resistive devices to store, maintain and compare activation values. By distributing an instance of this circuit to every row in memory, the activation for all memory objects can be updated in parallel. A test using the word sense disambiguation task shows this circuit-based activation model only incurs a small loss in accuracy compared to exact base-level calculations. A variation of spreading activation can also be achieved in-memory. Memory objects are encoded with high-dimensional vectors that create association between correlated representations. By storing these high-dimensional vectors in the new content-addressable memory, activation can be spread to related objects during search operations. The new memory is scalable, power and area efficient, and performs operations in parallel that are infeasible in real-time for a sequential processor with a conventional memory hierarchy.Thesis (Ph.D.) -- University of Adelaide, School of Electrical and Electronic Engineering, 201

    Adaptive extreme edge computing for wearable devices

    Get PDF
    Wearable devices are a fast-growing technology with impact on personal healthcare for both society and economy. Due to the widespread of sensors in pervasive and distributed networks, power consumption, processing speed, and system adaptation are vital in future smart wearable devices. The visioning and forecasting of how to bring computation to the edge in smart sensors have already begun, with an aspiration to provide adaptive extreme edge computing. Here, we provide a holistic view of hardware and theoretical solutions towards smart wearable devices that can provide guidance to research in this pervasive computing era. We propose various solutions for biologically plausible models for continual learning in neuromorphic computing technologies for wearable sensors. To envision this concept, we provide a systematic outline in which prospective low power and low latency scenarios of wearable sensors in neuromorphic platforms are expected. We successively describe vital potential landscapes of neuromorphic processors exploiting complementary metal-oxide semiconductors (CMOS) and emerging memory technologies (e.g. memristive devices). Furthermore, we evaluate the requirements for edge computing within wearable devices in terms of footprint, power consumption, latency, and data size. We additionally investigate the challenges beyond neuromorphic computing hardware, algorithms and devices that could impede enhancement of adaptive edge computing in smart wearable devices

    Analog Content-Addressable Memory from Complementary FeFETs

    Full text link
    To address the increasing computational demands of artificial intelligence (AI) and big data, compute-in-memory (CIM) integrates memory and processing units into the same physical location, reducing the time and energy overhead of the system. Despite advancements in non-volatile memory (NVM) for matrix multiplication, other critical data-intensive operations, like parallel search, have been overlooked. Current parallel search architectures, namely content-addressable memory (CAM), often use binary, which restricts density and functionality. We present an analog CAM (ACAM) cell, built on two complementary ferroelectric field-effect transistors (FeFETs), that performs parallel search in the analog domain with over 40 distinct match windows. We then deploy it to calculate similarity between vectors, a building block in the following two machine learning problems. ACAM outperforms ternary CAM (TCAM) when applied to similarity search for few-shot learning on the Omniglot dataset, yielding projected simulation results with improved inference accuracy by 5%, 3x denser memory architecture, and more than 100x faster speed compared to central processing unit (CPU) and graphics processing unit (GPU) per similarity search on scaled CMOS nodes. We also demonstrate 1-step inference on a kernel regression model by combining non-linear kernel computation and matrix multiplication in ACAM, with simulation estimates indicating 1,000x faster inference than CPU and GPU

    High-Density Solid-State Memory Devices and Technologies

    Get PDF
    This Special Issue aims to examine high-density solid-state memory devices and technologies from various standpoints in an attempt to foster their continuous success in the future. Considering that broadening of the range of applications will likely offer different types of solid-state memories their chance in the spotlight, the Special Issue is not focused on a specific storage solution but rather embraces all the most relevant solid-state memory devices and technologies currently on stage. Even the subjects dealt with in this Special Issue are widespread, ranging from process and design issues/innovations to the experimental and theoretical analysis of the operation and from the performance and reliability of memory devices and arrays to the exploitation of solid-state memories to pursue new computing paradigms

    AI/ML Algorithms and Applications in VLSI Design and Technology

    Full text link
    An evident challenge ahead for the integrated circuit (IC) industry in the nanometer regime is the investigation and development of methods that can reduce the design complexity ensuing from growing process variations and curtail the turnaround time of chip manufacturing. Conventional methodologies employed for such tasks are largely manual; thus, time-consuming and resource-intensive. In contrast, the unique learning strategies of artificial intelligence (AI) provide numerous exciting automated approaches for handling complex and data-intensive tasks in very-large-scale integration (VLSI) design and testing. Employing AI and machine learning (ML) algorithms in VLSI design and manufacturing reduces the time and effort for understanding and processing the data within and across different abstraction levels via automated learning algorithms. It, in turn, improves the IC yield and reduces the manufacturing turnaround time. This paper thoroughly reviews the AI/ML automated approaches introduced in the past towards VLSI design and manufacturing. Moreover, we discuss the scope of AI/ML applications in the future at various abstraction levels to revolutionize the field of VLSI design, aiming for high-speed, highly intelligent, and efficient implementations

    MOVING OBJECT DETECTION WITH MEMRISTIVE CROSSBAR ARRAYS

    Get PDF
    This thesis is dedicated to the hardware implementation of a novel moving object detection algorithm. Proposed circuit includes several stages, each of which implements a particular step of the algorithm. Four higher bit planes are extracted from a grayscale image and stored in memristive crossbar arrays, and the respective bit planes are compared via memristive threshold logic gates in XOR configuration. In the next stage, compared bit planes are combined by weighted summation, with a highest weight assigned to MSB plane and smaller weights for less significant bit planes. After summation stage, obtained grayscale image is thresholded to obtain binary image. The last stage is implemented via memristive content-addressable memory array, which serves two purposes. It is used as a long-term memory in comparison to crossbar arrays, which serve as a short-term memory of proposed circuit. Content-addressable memory is updated based on the row-by-row difference between first and second pair of frames processed by previous stages. It also allows for analysis of object movement direction and velocity by observing the row capacitors’ discharge. Simulations show that accuracy of proposed circuit operation is increased with the larger array size. Delay analysis of the circuit is carried out, power and area calculations show that proposed circuit is a viable candidate as a co-processing operator for existing image sensors
    corecore