136 research outputs found
High Performance Pre-computation based Self-Controlled Precharge-Free Content-Addressable Memory
Content-addressable memory (CAM) is a special type of memory used in networking applications for very-high-speed searching operation. It compares input search data with the table of stored data, and returns the address of matching data in a parallel search method. Also the use of parallel comparison results in reduced search time, it also significantly increases power consumption when compared to precharge based CAM. The low-power NAND-type and high-speed NOR-type CAM methods require the precharge prior to the search. This PF phase leads to increase the settling time of the output and also reduce the speed of the search operation. In this paper, a High performance Pre-computation Based Self-Controlled Precharge-Free CAM (PB-SCPF CAM) structure is proposed for high-speed applications which reduce the settling time as well as improve the speed of the search. Where search time is very important for designing larger word lengths, SCPF architecture is efficacious in applications. The experimental results show that PB-SCPF approach can attain on average 32% in power reduction and 80% in delay reduction. The most important contribution of this project is that it offers theoretical and practical proofs to verify that our suggested PB-SCPF CAM system can achieve greater power reduction without the requirement of special CAM cell design. This shows that the approach which we have used is more flexible and adaptive for general designs and high speed applications
Compact and High-Performance TCAM Based on Scaled Double-Gate FeFETs
Ternary content addressable memory (TCAM), widely used in network routers and
high-associativity caches, is gaining popularity in machine learning and
data-analytic applications. Ferroelectric FETs (FeFETs) are a promising
candidate for implementing TCAM owing to their high ON/OFF ratio,
non-volatility, and CMOS compatibility. However, conventional single-gate
FeFETs (SG-FeFETs) suffer from relatively high write voltage, low endurance,
potential read disturbance, and face scaling challenges. Recently, a
double-gate FeFET (DG-FeFET) has been proposed and outperforms SG-FeFETs in
many aspects. This paper investigates TCAM design challenges specific to
DG-FeFETs and introduces a novel 1.5T1Fe TCAM design based on DG-FeFETs. A
2-step search with early termination is employed to reduce the cell area and
improve energy efficiency. A shared driver design is proposed to reduce the
peripherals area. Detailed analysis and SPICE simulation show that the 1.5T1Fe
DG-TCAM leads to superior search speed and energy efficiency. The 1.5T1Fe TCAM
design can also be built with SG-FeFETs, which achieve search latency and
energy improvement compared with 2FeFET TCAM.Comment: Accepted by Design Automation Conference (DAC) 202
An Energy-Efficient Design Paradigm for a Memory Cell Based on Novel Nanoelectromechanical Switches
In this chapter, we explain NEMsCAM cell, a new content-addressable memory (CAM) cell, which is designed based on both CMOS technologies and nanoelectromechanical (NEM) switches. The memory part of NEMsCAM is designed with two complementary nonvolatile NEM switches and located on top of the CMOS-based comparison component. As a use case, we evaluate first-level instruction and data translation lookaside buffers (TLBs) with 16 nm CMOS technology at 2 GHz. The simulation results demonstrate that the NEMsCAM TLB reduces the energy consumption per search operation (by 27%), standby mode (by 53.9%), write operation (by 41.9%), and the area (by 40.5%) compared to a CMOS-only TLB with minimal performance overhead
A High Performance DDR3 SDRAM Controller
The paper presents the implementation of compliant DDR3 memory controller. It discusses the overall architecture of the DDR3 controller along with the detailed design and operation of its individual sub blocks, the pipelining implemented in the design to increase the design throughput. It also discusses the advantages of DDR3 memories over DDR2 memories operation. Double Data Rate (DDR) SDRAMs have been prevalent in the PC memory market in recent years and are widely used for networking systems. These memory devices are rapidly developing, with high density, high memory bandwidth and low device cost. However, because of the high-speed interface technology and complex instruction-based memory access control, a specific purpose memory controller is necessary for optimizing the memory access trade off. In this paper, a specific purpose DDR3 controller for highperformance is proposed
Long-Term Memory for Cognitive Architectures: A Hardware Approach Using Resistive Devices
A cognitive agent capable of reliably performing complex tasks over a long time will acquire a large store of knowledge. To interact with changing circumstances, the agent will need to quickly search and retrieve knowledge relevant to its current context. Real time knowledge search and cognitive processing like this is a challenge for conventional computers, which are not optimised for such tasks. This thesis describes a new content-addressable memory, based on resistive devices, that can perform massively parallel knowledge search in the memory array. The fundamental circuit block that supports this capability is a memory cell that closely couples comparison logic with non-volatile storage. By using resistive devices instead of transistors in both the comparison circuit and storage elements, this cell improves area density by over an order of magnitude compared to state of the art CMOS implementations. The resulting memory does not need power to maintain stored information, and is therefore well suited to cognitive agents with large long-term memories. The memory incorporates activation circuits, which bias the knowledge retrieval process according to past memory access patterns. This is achieved by approximating the widely used base-level activation function using resistive devices to store, maintain and compare activation values. By distributing an instance of this circuit to every row in memory, the activation for all memory objects can be updated in parallel. A test using the word sense disambiguation task shows this circuit-based activation model only incurs a small loss in accuracy compared to exact base-level calculations. A variation of spreading activation can also be achieved in-memory. Memory objects are encoded with high-dimensional vectors that create association between correlated representations. By storing these high-dimensional vectors in the new content-addressable memory, activation can be spread to related objects during search operations. The new memory is scalable, power and area efficient, and performs operations in parallel that are infeasible in real-time for a sequential processor with a conventional memory hierarchy.Thesis (Ph.D.) -- University of Adelaide, School of Electrical and Electronic Engineering, 201
Custom Memory Design for Logic-in-Memory: Drawbacks and Improvements over Conventional Memories
The speed of modern digital systems is severely limited by memory latency (the “Memory Wall” problem). Data exchange between Logic and Memory is also responsible for a large part of the system energy consumption. Logic-in-Memory (LiM) represents an attractive solution to this problem. By performing part of the computations directly inside the memory the system speed can be improved while reducing its energy consumption. LiM solutions that offer the major boost in performance are based on the modification of the memory cell. However, what is the cost of such modifications? How do these impact the memory array performance? In this work, this question is addressed by analysing a LiM memory array implementing an algorithm for the maximum/minimum value computation. The memory array is designed at physical level using the FreePDK 45nm CMOS process, with three memory cell variants, and its performance is compared to SRAM and CAM memories. Results highlight that read and write operations performance is worsened but in-memory operations result to be very efficient: a 55.26% reduction in the energy-delay product is measured for the AND operation with respect to the SRAM read one. Therefore, the LiM approach represents a very promising solution for low-density and high-performance memories
Variable length pattern coding for power reduction in off-chip data buses
Off-chip buses consume a huge fraction (20%-40%) of the system power. Hence, techniques
such as increasing bus widths, transition encoding etc. have been used for
power reduction on off-chip data buses. Since capacitances at the I/O pads and interwire
capacitances contribute significantly to increase in power, encoding/decoding
schemes have been developed to reduce switching activity of the off-chip bus lines,
thus reducing power. Frequent-Value Encoding(FVE) [1], Frequent Value Encoding
with Xor (FVExor) [1] and VALVE [2] are some of the better known encoding schemes
but they still have scope for improvement.
This thesis addresses the problem of power reduction in off-chip data buses by
encoding variable number (1 to 4) of fixed-size (32-bit) data values (variable length
patterns) which exhibit temporal locality. This characteristic enables us to cache
these patterns using 64-entry CAM at the encoder and 64-entry SRAM at the decoder.
Whenever a pattern match occurs a 2-bit code indicating the index of the match is
sent. If a variable length pattern match occurs then the code and unmatched portion
of data is sent.
We implemented our scheme, Variable Length Pattern Coding (VLPC) for various
integer and floating point benchmarks and have seen 6% to 49% encodable patterns
in these benchmarks. Based on the experiments on simplescalar and our analysis
in MATLAB, we obtained 4.88% to 40.11% reduction in transition activity for SPEC2000 benchmarks such as crafty, swim, mcf, applu, ammp etc. over unencoded
data. This is 0.3% to 38.9% higher than that obtained using FVE, FVExor [1] and
VALVE [2] encoding schemes. Finally, we have designed a low-power custom CAM
and SRAM using 45nm BSIM4 technology models which has been used to verify lower
latency of data matching and storing
- …