2,201 research outputs found

    Memristor MOS Content Addressable Memory (MCAM): Hybrid Architecture for Future High Performance Search Engines

    Full text link
    Large-capacity Content Addressable Memory (CAM) is a key element in a wide variety of applications. The inevitable complexities of scaling MOS transistors introduce a major challenge in the realization of such systems. Convergence of disparate technologies, which are compatible with CMOS processing, may allow extension of Moore's Law for a few more years. This paper provides a new approach towards the design and modeling of Memristor (Memory resistor) based Content Addressable Memory (MCAM) using a combination of memristor MOS devices to form the core of a memory/compare logic cell that forms the building block of the CAM architecture. The non-volatile characteristic and the nanoscale geometry together with compatibility of the memristor with CMOS processing technology increases the packing density, provides for new approaches towards power management through disabling CAM blocks without loss of stored data, reduces power dissipation, and has scope for speed improvement as the technology matures.Comment: 10 pages, 11 figure

    Self-checking on-line testable static RAM

    Get PDF
    This is a fault-tolerant random access memory for use in fault-tolerant computers. It comprises a plurality of memory chips each comprising a plurality of on-line testable and correctable memory cells disposed in rows and columns for holding individually addressable binary bits and provision for error detection incorporated into each memory cell for outputting an error signal whenever a transient error occurs therein. In one embodiment, each of the memory cells comprises a pair of static memory sub-cells for simultaneously receiving and holding a common binary data bit written to the memory cell and the error detection provision comprises comparator logic for continuously sensing and comparing the contents of the memory sub-cells to one another and for outputting the error signal whenever the contents do not match. In another embodiment, each of the memory cells comprises a static memory sub-cell and a dynamic memory sub-cell for simultaneously receiving and holding a common binary data bit written to the memory cell and the error detection provision comprises comparator logic for continuously sensing and comparing the contents of the static memory sub-cell to the dynamic memory sub-cell and for outputting the error signal whenever the contents do not match. Capability for correction of errors is also included

    Novel low power CAM architecture

    Get PDF
    One special type of memory use for high speed address lookup in router or cache address lookup in a processor is Content Addressable Memory (CAM). CAM can also be used in pattern recognition applications where a unique pattern needs to be determined if a match is found. CAM has an additional comparison circuit in each memory bit compared to Static Random Access Memory. This comparison circuit provides CAM with an additional capability for searching the entire memory in one clock cycle. With its hardware parallel comparison architecture, it makes CAM an ideal candidate for any high speed data lookup or for address processing applications. Because of its high power demand nature, CAM is not often used in a mobile device. To take advantage of CAM on portable devices, it is necessary to reduce its power consumption. It is for this reason that much research has been conducted on investigating different methods and techniques for reducing the overall power. The objective is to incorporate and utilize circuit and power reduction techniques in a new architecture to further reduce CAM’s energy consumption. The new CAM architecture illustrates the reduction of both dynamic and static power dissipation at 65nm sub-micron environment. This thesis will present a novel CAM architecture, which will reduce power consumption significantly compared to traditional CAM architecture, with minimal or no performance losses. Comparisons with other previously proposed architectures will be presented when implementing these designs under 65nm process environment. Results show the novel CAM architecture only consumes 4.021mW of power compared to the traditional CAM architecture of 12.538mW at 800MHz frequency and is more energy efficient over all other previously proposed designs

    Reconfigurable nanoelectronics using graphene based spintronic logic gates

    Full text link
    This paper presents a novel design concept for spintronic nanoelectronics that emphasizes a seamless integration of spin-based memory and logic circuits. The building blocks are magneto-logic gates based on a hybrid graphene/ferromagnet material system. We use network search engines as a technology demonstration vehicle and present a spin-based circuit design with smaller area, faster speed, and lower energy consumption than the state-of-the-art CMOS counterparts. This design can also be applied in applications such as data compression, coding and image recognition. In the proposed scheme, over 100 spin-based logic operations are carried out before any need for a spin-charge conversion. Consequently, supporting CMOS electronics requires little power consumption. The spintronic-CMOS integrated system can be implemented on a single 3-D chip. These nonvolatile logic circuits hold potential for a paradigm shift in computing applications.Comment: 14 pages (single column), 6 figure

    Associative Memory Based Experience Replay for Deep Reinforcement Learning

    Full text link
    Experience replay is an essential component in deep reinforcement learning (DRL), which stores the experiences and generates experiences for the agent to learn in real time. Recently, prioritized experience replay (PER) has been proven to be powerful and widely deployed in DRL agents. However, implementing PER on traditional CPU or GPU architectures incurs significant latency overhead due to its frequent and irregular memory accesses. This paper proposes a hardware-software co-design approach to design an associative memory (AM) based PER, AMPER, with an AM-friendly priority sampling operation. AMPER replaces the widely-used time-costly tree-traversal-based priority sampling in PER while preserving the learning performance. Further, we design an in-memory computing hardware architecture based on AM to support AMPER by leveraging parallel in-memory search operations. AMPER shows comparable learning performance while achieving 55x to 270x latency improvement when running on the proposed hardware compared to the state-of-the-art PER running on GPU.Comment: 9 pages, 9 figures. The work was accepted by the 41st International Conference on Computer-Aided Design (ICCAD), 2022, San Dieg

    Core interface optimization for multi-core neuromorphic processors

    Full text link
    Hardware implementations of Spiking Neural Networks (SNNs) represent a promising approach to edge-computing for applications that require low-power and low-latency, and which cannot resort to external cloud-based computing services. However, most solutions proposed so far either support only relatively small networks, or take up significant hardware resources, to implement large networks. To realize large-scale and scalable SNNs it is necessary to develop an efficient asynchronous communication and routing fabric that enables the design of multi-core architectures. In particular the core interface that manages inter-core spike communication is a crucial component as it represents the bottleneck of Power-Performance-Area (PPA) especially for the arbitration architecture and the routing memory. In this paper we present an arbitration mechanism with the corresponding asynchronous encoding pipeline circuits, based on hierarchical arbiter trees. The proposed scheme reduces the latency by more than 70% in sparse-event mode, compared to the state-of-the-art arbitration architectures, with lower area cost. The routing memory makes use of asynchronous Content Addressable Memory (CAM) with Current Sensing Completion Detection (CSCD), which saves approximately 46% energy, and achieves a 40% increase in throughput against conventional asynchronous CAM using configurable delay lines, at the cost of only a slight increase in area. In addition as it radically reduces the core interface resources in multi-core neuromorphic processors, the arbitration architecture and CAM architecture we propose can be also applied to a wide range of general asynchronous circuits and systems

    Low-Power High-Performance Ternary Content Addressable Memory Circuits

    Get PDF
    Ternary content addressable memories (TCAMs) are hardware-based parallel lookup tables with bit-level masking capability. They are attractive for applications such as packet forwarding and classification in network routers. Despite the attractive features of TCAMs, high power consumption is one of the most critical challenges faced by TCAM designers. This work proposes circuit techniques for reducing TCAM power consumption. The main contribution of this work is divided in two parts: (i) reduction in match line (ML) sensing energy, and (ii) static-power reduction techniques. The ML sensing energy is reduced by employing (i) positive-feedback ML sense amplifiers (MLSAs), (ii) low-capacitance comparison logic, and (iii) low-power ML-segmentation techniques. The positive-feedback MLSAs include both resistive and active feedback to reduce the ML sensing energy. A body-bias technique can further improve the feedback action at the expense of additional area and ML capacitance. The measurement results of the active-feedback MLSA show 50-56% reduction in ML sensing energy. The measurement results of the proposed low-capacitance comparison logic show 25% and 42% reductions in ML sensing energy and time, respectively, which can further be improved by careful layout. The low-power ML-segmentation techniques include dual ML TCAM and charge-shared ML. Simulation results of the dual ML TCAM that connects two sides of the comparison logic to two ML segments for sequential sensing show 43% power savings for a small (4%) trade-off in the search speed. The charge-shared ML scheme achieves power savings by partial recycling of the charge stored in the first ML segment. Chip measurement results show that the charge-shared ML scheme results in 11% and 9% reductions in ML sensing time and energy, respectively, which can be improved to 19-25% by using a digitally controlled charge sharing time-window and a slightly modified MLSA. The static power reduction is achieved by a dual-VDD technique and low-leakage TCAM cells. The dual-VDD technique trades-off the excess noise margin of MLSA for smaller cell leakage by applying a smaller VDD to TCAM cells and a larger VDD to the peripheral circuits. The low-leakage TCAM cells trade off the speed of READ and WRITE operations for smaller cell area and leakage. Finally, design and testing of a complete TCAM chip are presented, and compared with other published designs
    • …
    corecore