299 research outputs found

    Low-Power High-Performance Ternary Content Addressable Memory Circuits

    Get PDF
    Ternary content addressable memories (TCAMs) are hardware-based parallel lookup tables with bit-level masking capability. They are attractive for applications such as packet forwarding and classification in network routers. Despite the attractive features of TCAMs, high power consumption is one of the most critical challenges faced by TCAM designers. This work proposes circuit techniques for reducing TCAM power consumption. The main contribution of this work is divided in two parts: (i) reduction in match line (ML) sensing energy, and (ii) static-power reduction techniques. The ML sensing energy is reduced by employing (i) positive-feedback ML sense amplifiers (MLSAs), (ii) low-capacitance comparison logic, and (iii) low-power ML-segmentation techniques. The positive-feedback MLSAs include both resistive and active feedback to reduce the ML sensing energy. A body-bias technique can further improve the feedback action at the expense of additional area and ML capacitance. The measurement results of the active-feedback MLSA show 50-56% reduction in ML sensing energy. The measurement results of the proposed low-capacitance comparison logic show 25% and 42% reductions in ML sensing energy and time, respectively, which can further be improved by careful layout. The low-power ML-segmentation techniques include dual ML TCAM and charge-shared ML. Simulation results of the dual ML TCAM that connects two sides of the comparison logic to two ML segments for sequential sensing show 43% power savings for a small (4%) trade-off in the search speed. The charge-shared ML scheme achieves power savings by partial recycling of the charge stored in the first ML segment. Chip measurement results show that the charge-shared ML scheme results in 11% and 9% reductions in ML sensing time and energy, respectively, which can be improved to 19-25% by using a digitally controlled charge sharing time-window and a slightly modified MLSA. The static power reduction is achieved by a dual-VDD technique and low-leakage TCAM cells. The dual-VDD technique trades-off the excess noise margin of MLSA for smaller cell leakage by applying a smaller VDD to TCAM cells and a larger VDD to the peripheral circuits. The low-leakage TCAM cells trade off the speed of READ and WRITE operations for smaller cell area and leakage. Finally, design and testing of a complete TCAM chip are presented, and compared with other published designs

    A CMOS Time to Digital Converter IC with 2 Level Analog CAM

    Get PDF
    A time to charge converter IC with an analog memory unit (TCCAMU) has been designed and fabricated in HP\u27s CMOS 1.2-”m n-well process. The TCCAMU is an event driven system designed for front end data acquisition in high energy physics experiments. The chip includes a time to charge converter, analog Level 1 and Level 2 associative memories for input pipelining and data filtering, and an A/D converter. The intervals measured and digitized range from 8-24 ns. Testing of the fabricated chip resulted in an LSB width of 107 ps, a typical differential nonlinearity of \u3c 35 ps, and a typical integral nonlinearity of \u3c 200 ps. The average power dissipation is 8.28 mW per channel. By counting the reference clock, a time resolution of 107 ps over ~ 1 s range could be realized

    Novel low power CAM architecture

    Get PDF
    One special type of memory use for high speed address lookup in router or cache address lookup in a processor is Content Addressable Memory (CAM). CAM can also be used in pattern recognition applications where a unique pattern needs to be determined if a match is found. CAM has an additional comparison circuit in each memory bit compared to Static Random Access Memory. This comparison circuit provides CAM with an additional capability for searching the entire memory in one clock cycle. With its hardware parallel comparison architecture, it makes CAM an ideal candidate for any high speed data lookup or for address processing applications. Because of its high power demand nature, CAM is not often used in a mobile device. To take advantage of CAM on portable devices, it is necessary to reduce its power consumption. It is for this reason that much research has been conducted on investigating different methods and techniques for reducing the overall power. The objective is to incorporate and utilize circuit and power reduction techniques in a new architecture to further reduce CAM’s energy consumption. The new CAM architecture illustrates the reduction of both dynamic and static power dissipation at 65nm sub-micron environment. This thesis will present a novel CAM architecture, which will reduce power consumption significantly compared to traditional CAM architecture, with minimal or no performance losses. Comparisons with other previously proposed architectures will be presented when implementing these designs under 65nm process environment. Results show the novel CAM architecture only consumes 4.021mW of power compared to the traditional CAM architecture of 12.538mW at 800MHz frequency and is more energy efficient over all other previously proposed designs

    An integrated associative processing system

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1994.Includes bibliographical references (p. 97-105).by Frederick Paul Herrmann.Ph.D

    One Step in-Memory Solution of Inverse Algebraic Problems

    Get PDF
    AbstractMachine learning requires to process large amount of irregular data and extract meaningful information. Von-Neumann architecture is being challenged by such computation, in fact a physical separation between memory and processing unit limits the maximum speed in analyzing lots of data and the majority of time and energy are spent to make information travel from memory to the processor and back. In-memory computing executes operations directly within the memory without any information travelling. In particular, thanks to emerging memory technologies such as memristors, it is possible to program arbitrary real numbers directly in a single memory device in an analog fashion and at the array level, execute algebraic operation in-memory and in one step. In this chapter the latest results in accelerating inverse operation, such as the solution of linear systems, in-memory and in a single computational cycle will be presented

    A CMOS time to digital converter IC with 2 level analog CAM

    Full text link

    Energy-Efficiency in Optical Networks

    Get PDF

    Device and Circuit Architectures for In‐Memory Computing

    Get PDF
    With the rise in artificial intelligence (AI), computing systems are facing new challenges related to the large amount of data and the increasing burden of communication between the memory and the processing unit. In‐memory computing (IMC) appears as a promising approach to suppress the memory bottleneck and enable higher parallelism of data processing, thanks to the memory array architecture. As a result, IMC shows a better throughput and lower energy consumption with respect to the conventional digital approach, not only for typical AI tasks, but also for general‐purpose problems such as constraint satisfaction problems (CSPs) and linear algebra. Herein, an overview of IMC is provided in terms of memory devices and circuit architectures. First, the memory device technologies adopted for IMC are summarized, focusing on both charge‐based memories and emerging devices relying on electrically induced material modification at the chemical or physical level. Then, the computational memory programming and the corresponding device nonidealities are described with reference to offline and online training of IMC circuits. Finally, array architectures for computing are reviewed, including typical architectures for neural network accelerators, content addressable memory (CAM), and novel circuit topologies for general‐purpose computing with low complexity

    Memory Management for Emerging Memory Technologies

    Get PDF
    The Memory Wall, or the gap between CPU speed and main memory latency, is ever increasing. The latency of Dynamic Random-Access Memory (DRAM) is now of the order of hundreds of CPU cycles. Additionally, the DRAM main memory is experiencing power, performance and capacity constraints that limit process technology scaling. On the other hand, the workloads running on such systems are themselves changing due to virtualization and cloud computing demanding more performance of the data centers. Not only do these workloads have larger working set sizes, but they are also changing the way memory gets used, resulting in higher sharing and increased bandwidth demands. New Non-Volatile Memory technologies (NVM) are emerging as an answer to the current main memory issues. This thesis looks at memory management issues as the emerging memory technologies get integrated into the memory hierarchy. We consider the problems at various levels in the memory hierarchy, including sharing of CPU LLC, traffic management to future non-volatile memories behind the LLC, and extending main memory through the employment of NVM. The first solution we propose is “Adaptive Replacement and Insertion" (ARI), an adaptive approach to last-level CPU cache management, optimizing the cache miss rate and writeback rate simultaneously. Our specific focus is to reduce writebacks as much as possible while maintaining or improving miss rate relative to conventional LRU replacement policy, with minimal hardware overhead. ARI reduces writebacks on benchmarks from SPEC2006 suite on average by 32.9% while also decreasing misses on average by 4.7%. In a PCM based memory system, this decreases energy consumption by 23% compared to LRU and provides a 49% lifetime improvement beyond what is possible with randomized wear-leveling. Our second proposal is “Variable-Timeslice Thread Scheduling" (VATS), an OS kernel-level approach to CPU cache sharing. With modern, large, last-level caches (LLC), the time to fill the LLC is greater than the OS scheduling window. As a result, when a thread aggressively thrashes the LLC by replacing much of the data in it, another thread may not be able to recover its working set before being rescheduled. We isolate the threads in time by increasing their allotted time quanta, and allowing larger periods of time between interfering threads. Our approach, compared to conventional scheduling, mitigates up to 100% of the performance loss caused by CPU LLC interference. The system throughput is boosted by up to 15%. As an unconventional approach to utilizing emerging memory technologies, we present a Ternary Content-Addressable Memory (TCAM) design with Flash transistors. TCAM is successfully used in network routing but can also be utilized in the OS Virtual Memory applications. Based on our layout and circuit simulation experiments, we conclude that our FTCAM block achieves an area improvement of 7.9× and a power improvement of 1.64× compared to a CMOS approach. In order to lower the cost of Main Memory in systems with huge memory demand, it is becoming practical to extend the DRAM in the system with the less-expensive NVMe Flash, for a much lower system cost. However, given the relatively high Flash devices access latency, naively using them as main memory leads to serious performance degradation. We propose OSVPP, a software-only, OS swap-based page prefetching scheme for managing such hybrid DRAM + NVM systems. We show that it is possible to gain about 50% of the lost performance due to swapping into the NVM and thus enable the utilization of such hybrid systems for memory-hungry applications, lowering the memory cost while keeping the performance comparable to the DRAM-only system
    • 

    corecore