136 research outputs found

    High Performance Pre-computation based Self-Controlled Precharge-Free Content-Addressable Memory

    Get PDF
    Content-addressable memory (CAM) is a special type of memory used in networking applications for very-high-speed searching operation. It compares input search data with the table of stored data, and returns the address of matching data in a parallel search method. Also the use of parallel comparison results in reduced search time, it also significantly increases power consumption when compared to precharge based CAM. The low-power NAND-type and high-speed NOR-type CAM methods require the precharge prior to the search. This PF phase leads to increase the settling time of the output and also reduce the speed of the search operation. In this paper, a High performance Pre-computation Based Self-Controlled Precharge-Free CAM (PB-SCPF CAM) structure is proposed for high-speed applications which reduce the settling time as well as improve the speed of the search. Where search time is very important for designing larger word lengths, SCPF architecture is efficacious in applications. The experimental results show that PB-SCPF approach can attain on average 32% in power reduction and 80% in delay reduction. The most important contribution of this project is that it offers theoretical and practical proofs to verify that our suggested PB-SCPF CAM system can achieve greater power reduction without the requirement of special CAM cell design. This shows that the approach which we have used is more flexible and adaptive for general designs and high speed applications

    Compact and High-Performance TCAM Based on Scaled Double-Gate FeFETs

    Full text link
    Ternary content addressable memory (TCAM), widely used in network routers and high-associativity caches, is gaining popularity in machine learning and data-analytic applications. Ferroelectric FETs (FeFETs) are a promising candidate for implementing TCAM owing to their high ON/OFF ratio, non-volatility, and CMOS compatibility. However, conventional single-gate FeFETs (SG-FeFETs) suffer from relatively high write voltage, low endurance, potential read disturbance, and face scaling challenges. Recently, a double-gate FeFET (DG-FeFET) has been proposed and outperforms SG-FeFETs in many aspects. This paper investigates TCAM design challenges specific to DG-FeFETs and introduces a novel 1.5T1Fe TCAM design based on DG-FeFETs. A 2-step search with early termination is employed to reduce the cell area and improve energy efficiency. A shared driver design is proposed to reduce the peripherals area. Detailed analysis and SPICE simulation show that the 1.5T1Fe DG-TCAM leads to superior search speed and energy efficiency. The 1.5T1Fe TCAM design can also be built with SG-FeFETs, which achieve search latency and energy improvement compared with 2FeFET TCAM.Comment: Accepted by Design Automation Conference (DAC) 202

    An Energy-Efficient Design Paradigm for a Memory Cell Based on Novel Nanoelectromechanical Switches

    Get PDF
    In this chapter, we explain NEMsCAM cell, a new content-addressable memory (CAM) cell, which is designed based on both CMOS technologies and nanoelectromechanical (NEM) switches. The memory part of NEMsCAM is designed with two complementary nonvolatile NEM switches and located on top of the CMOS-based comparison component. As a use case, we evaluate first-level instruction and data translation lookaside buffers (TLBs) with 16 nm CMOS technology at 2 GHz. The simulation results demonstrate that the NEMsCAM TLB reduces the energy consumption per search operation (by 27%), standby mode (by 53.9%), write operation (by 41.9%), and the area (by 40.5%) compared to a CMOS-only TLB with minimal performance overhead

    A High Performance DDR3 SDRAM Controller

    Get PDF
    The paper presents the implementation of compliant DDR3 memory controller. It discusses the overall architecture of the DDR3 controller along with the detailed design and operation of its individual sub blocks, the pipelining implemented in the design to increase the design throughput. It also discusses the advantages of DDR3 memories over DDR2 memories operation. Double Data Rate (DDR) SDRAMs have been prevalent in the PC memory market in recent years and are widely used for networking systems. These memory devices are rapidly developing, with high density, high memory bandwidth and low device cost. However, because of the high-speed interface technology and complex instruction-based memory access control, a specific purpose memory controller is necessary for optimizing the memory access trade off. In this paper, a specific purpose DDR3 controller for highperformance is proposed

    Long-Term Memory for Cognitive Architectures: A Hardware Approach Using Resistive Devices

    Get PDF
    A cognitive agent capable of reliably performing complex tasks over a long time will acquire a large store of knowledge. To interact with changing circumstances, the agent will need to quickly search and retrieve knowledge relevant to its current context. Real time knowledge search and cognitive processing like this is a challenge for conventional computers, which are not optimised for such tasks. This thesis describes a new content-addressable memory, based on resistive devices, that can perform massively parallel knowledge search in the memory array. The fundamental circuit block that supports this capability is a memory cell that closely couples comparison logic with non-volatile storage. By using resistive devices instead of transistors in both the comparison circuit and storage elements, this cell improves area density by over an order of magnitude compared to state of the art CMOS implementations. The resulting memory does not need power to maintain stored information, and is therefore well suited to cognitive agents with large long-term memories. The memory incorporates activation circuits, which bias the knowledge retrieval process according to past memory access patterns. This is achieved by approximating the widely used base-level activation function using resistive devices to store, maintain and compare activation values. By distributing an instance of this circuit to every row in memory, the activation for all memory objects can be updated in parallel. A test using the word sense disambiguation task shows this circuit-based activation model only incurs a small loss in accuracy compared to exact base-level calculations. A variation of spreading activation can also be achieved in-memory. Memory objects are encoded with high-dimensional vectors that create association between correlated representations. By storing these high-dimensional vectors in the new content-addressable memory, activation can be spread to related objects during search operations. The new memory is scalable, power and area efficient, and performs operations in parallel that are infeasible in real-time for a sequential processor with a conventional memory hierarchy.Thesis (Ph.D.) -- University of Adelaide, School of Electrical and Electronic Engineering, 201

    Analog Content Addressable Memory

    Get PDF
    Electrical Engineerin

    Custom Memory Design for Logic-in-Memory: Drawbacks and Improvements over Conventional Memories

    Get PDF
    The speed of modern digital systems is severely limited by memory latency (the “Memory Wall” problem). Data exchange between Logic and Memory is also responsible for a large part of the system energy consumption. Logic-in-Memory (LiM) represents an attractive solution to this problem. By performing part of the computations directly inside the memory the system speed can be improved while reducing its energy consumption. LiM solutions that offer the major boost in performance are based on the modification of the memory cell. However, what is the cost of such modifications? How do these impact the memory array performance? In this work, this question is addressed by analysing a LiM memory array implementing an algorithm for the maximum/minimum value computation. The memory array is designed at physical level using the FreePDK 45nm CMOS process, with three memory cell variants, and its performance is compared to SRAM and CAM memories. Results highlight that read and write operations performance is worsened but in-memory operations result to be very efficient: a 55.26% reduction in the energy-delay product is measured for the AND operation with respect to the SRAM read one. Therefore, the LiM approach represents a very promising solution for low-density and high-performance memories

    Variable length pattern coding for power reduction in off-chip data buses

    Get PDF
    Off-chip buses consume a huge fraction (20%-40%) of the system power. Hence, techniques such as increasing bus widths, transition encoding etc. have been used for power reduction on off-chip data buses. Since capacitances at the I/O pads and interwire capacitances contribute significantly to increase in power, encoding/decoding schemes have been developed to reduce switching activity of the off-chip bus lines, thus reducing power. Frequent-Value Encoding(FVE) [1], Frequent Value Encoding with Xor (FVExor) [1] and VALVE [2] are some of the better known encoding schemes but they still have scope for improvement. This thesis addresses the problem of power reduction in off-chip data buses by encoding variable number (1 to 4) of fixed-size (32-bit) data values (variable length patterns) which exhibit temporal locality. This characteristic enables us to cache these patterns using 64-entry CAM at the encoder and 64-entry SRAM at the decoder. Whenever a pattern match occurs a 2-bit code indicating the index of the match is sent. If a variable length pattern match occurs then the code and unmatched portion of data is sent. We implemented our scheme, Variable Length Pattern Coding (VLPC) for various integer and floating point benchmarks and have seen 6% to 49% encodable patterns in these benchmarks. Based on the experiments on simplescalar and our analysis in MATLAB, we obtained 4.88% to 40.11% reduction in transition activity for SPEC2000 benchmarks such as crafty, swim, mcf, applu, ammp etc. over unencoded data. This is 0.3% to 38.9% higher than that obtained using FVE, FVExor [1] and VALVE [2] encoding schemes. Finally, we have designed a low-power custom CAM and SRAM using 45nm BSIM4 technology models which has been used to verify lower latency of data matching and storing
    • …
    corecore