2,201 research outputs found
Memristor MOS Content Addressable Memory (MCAM): Hybrid Architecture for Future High Performance Search Engines
Large-capacity Content Addressable Memory (CAM) is a key element in a wide
variety of applications. The inevitable complexities of scaling MOS transistors
introduce a major challenge in the realization of such systems. Convergence of
disparate technologies, which are compatible with CMOS processing, may allow
extension of Moore's Law for a few more years. This paper provides a new
approach towards the design and modeling of Memristor (Memory resistor) based
Content Addressable Memory (MCAM) using a combination of memristor MOS devices
to form the core of a memory/compare logic cell that forms the building block
of the CAM architecture. The non-volatile characteristic and the nanoscale
geometry together with compatibility of the memristor with CMOS processing
technology increases the packing density, provides for new approaches towards
power management through disabling CAM blocks without loss of stored data,
reduces power dissipation, and has scope for speed improvement as the
technology matures.Comment: 10 pages, 11 figure
Self-checking on-line testable static RAM
This is a fault-tolerant random access memory for use in fault-tolerant computers. It comprises a plurality of memory chips each comprising a plurality of on-line testable and correctable memory cells disposed in rows and columns for holding individually addressable binary bits and provision for error detection incorporated into each memory cell for outputting an error signal whenever a transient error occurs therein. In one embodiment, each of the memory cells comprises a pair of static memory sub-cells for simultaneously receiving and holding a common binary data bit written to the memory cell and the error detection provision comprises comparator logic for continuously sensing and comparing the contents of the memory sub-cells to one another and for outputting the error signal whenever the contents do not match. In another embodiment, each of the memory cells comprises a static memory sub-cell and a dynamic memory sub-cell for simultaneously receiving and holding a common binary data bit written to the memory cell and the error detection provision comprises comparator logic for continuously sensing and comparing the contents of the static memory sub-cell to the dynamic memory sub-cell and for outputting the error signal whenever the contents do not match. Capability for correction of errors is also included
Novel low power CAM architecture
One special type of memory use for high speed address lookup in router or cache address lookup in a processor is Content Addressable Memory (CAM). CAM can also be used in pattern recognition applications where a unique pattern needs to be determined if a match is found. CAM has an additional comparison circuit in each memory bit compared to Static Random Access Memory. This comparison circuit provides CAM with an additional capability for searching the entire memory in one clock cycle. With its hardware parallel comparison architecture, it makes CAM an ideal candidate for any high speed data lookup or for address processing applications. Because of its high power demand nature, CAM is not often used in a mobile device. To take advantage of CAM on portable devices, it is necessary to reduce its power consumption. It is for this reason that much research has been conducted on investigating different methods and techniques for reducing the overall power. The objective is to incorporate and utilize circuit and power reduction techniques in a new architecture to further reduce CAM’s energy consumption. The new CAM architecture illustrates the reduction of both dynamic and static power dissipation at 65nm sub-micron environment. This thesis will present a novel CAM architecture, which will reduce power consumption significantly compared to traditional CAM architecture, with minimal or no performance losses. Comparisons with other previously proposed architectures will be presented when implementing these designs under 65nm process environment. Results show the novel CAM architecture only consumes 4.021mW of power compared to the traditional CAM architecture of 12.538mW at 800MHz frequency and is more energy efficient over all other previously proposed designs
Reconfigurable nanoelectronics using graphene based spintronic logic gates
This paper presents a novel design concept for spintronic nanoelectronics
that emphasizes a seamless integration of spin-based memory and logic circuits.
The building blocks are magneto-logic gates based on a hybrid
graphene/ferromagnet material system. We use network search engines as a
technology demonstration vehicle and present a spin-based circuit design with
smaller area, faster speed, and lower energy consumption than the
state-of-the-art CMOS counterparts. This design can also be applied in
applications such as data compression, coding and image recognition. In the
proposed scheme, over 100 spin-based logic operations are carried out before
any need for a spin-charge conversion. Consequently, supporting CMOS
electronics requires little power consumption. The spintronic-CMOS integrated
system can be implemented on a single 3-D chip. These nonvolatile logic
circuits hold potential for a paradigm shift in computing applications.Comment: 14 pages (single column), 6 figure
Associative Memory Based Experience Replay for Deep Reinforcement Learning
Experience replay is an essential component in deep reinforcement learning
(DRL), which stores the experiences and generates experiences for the agent to
learn in real time. Recently, prioritized experience replay (PER) has been
proven to be powerful and widely deployed in DRL agents. However, implementing
PER on traditional CPU or GPU architectures incurs significant latency overhead
due to its frequent and irregular memory accesses. This paper proposes a
hardware-software co-design approach to design an associative memory (AM) based
PER, AMPER, with an AM-friendly priority sampling operation. AMPER replaces the
widely-used time-costly tree-traversal-based priority sampling in PER while
preserving the learning performance. Further, we design an in-memory computing
hardware architecture based on AM to support AMPER by leveraging parallel
in-memory search operations. AMPER shows comparable learning performance while
achieving 55x to 270x latency improvement when running on the proposed hardware
compared to the state-of-the-art PER running on GPU.Comment: 9 pages, 9 figures. The work was accepted by the 41st International
Conference on Computer-Aided Design (ICCAD), 2022, San Dieg
Core interface optimization for multi-core neuromorphic processors
Hardware implementations of Spiking Neural Networks (SNNs) represent a
promising approach to edge-computing for applications that require low-power
and low-latency, and which cannot resort to external cloud-based computing
services. However, most solutions proposed so far either support only
relatively small networks, or take up significant hardware resources, to
implement large networks. To realize large-scale and scalable SNNs it is
necessary to develop an efficient asynchronous communication and routing fabric
that enables the design of multi-core architectures. In particular the core
interface that manages inter-core spike communication is a crucial component as
it represents the bottleneck of Power-Performance-Area (PPA) especially for the
arbitration architecture and the routing memory. In this paper we present an
arbitration mechanism with the corresponding asynchronous encoding pipeline
circuits, based on hierarchical arbiter trees. The proposed scheme reduces the
latency by more than 70% in sparse-event mode, compared to the state-of-the-art
arbitration architectures, with lower area cost. The routing memory makes use
of asynchronous Content Addressable Memory (CAM) with Current Sensing
Completion Detection (CSCD), which saves approximately 46% energy, and achieves
a 40% increase in throughput against conventional asynchronous CAM using
configurable delay lines, at the cost of only a slight increase in area. In
addition as it radically reduces the core interface resources in multi-core
neuromorphic processors, the arbitration architecture and CAM architecture we
propose can be also applied to a wide range of general asynchronous circuits
and systems
Low-Power High-Performance Ternary Content Addressable Memory Circuits
Ternary content addressable memories (TCAMs) are hardware-based parallel lookup tables with bit-level masking capability. They are attractive for applications such as packet forwarding and classification in network routers. Despite the attractive features of TCAMs, high power consumption is one of the most critical challenges faced by TCAM designers. This work proposes circuit techniques for reducing TCAM power consumption. The main contribution of this work is divided in two parts: (i) reduction in match line (ML) sensing energy, and (ii) static-power reduction techniques. The ML sensing energy is reduced by employing (i) positive-feedback ML sense amplifiers (MLSAs), (ii) low-capacitance comparison logic, and (iii) low-power ML-segmentation techniques. The positive-feedback MLSAs include both resistive and active feedback to reduce the ML sensing energy. A body-bias technique can further improve the feedback action at the expense of additional area and ML capacitance. The measurement results of the active-feedback MLSA show 50-56% reduction in ML sensing energy. The measurement results of the proposed low-capacitance comparison logic show 25% and 42% reductions in ML sensing energy and time, respectively, which can further be improved by careful layout. The low-power ML-segmentation techniques include dual ML TCAM and charge-shared ML. Simulation results of the dual ML TCAM that connects two sides of the comparison logic to two ML segments for sequential sensing show 43% power savings for a small (4%) trade-off in the search speed. The charge-shared ML scheme achieves power savings by partial recycling of the charge stored in the first ML segment. Chip measurement results show that the charge-shared ML scheme results in 11% and 9% reductions in ML sensing time and energy, respectively, which can be improved to 19-25% by using a digitally controlled charge sharing time-window and a slightly modified MLSA. The static power reduction is achieved by a dual-VDD technique and low-leakage TCAM cells. The dual-VDD technique trades-off the excess noise margin of MLSA for smaller cell leakage by applying a smaller VDD to TCAM cells and a larger VDD to the peripheral circuits. The low-leakage TCAM cells trade off the speed of READ and WRITE operations for smaller cell area and leakage. Finally, design and testing of a complete TCAM chip are presented, and compared with other published designs
- …