19 research outputs found

    DRC 2 : Dynamically Reconfigurable Computing Circuit based on Memory Architecture

    Get PDF
    International audienceThis paper presents a novel energy-efficient and Dynamically Reconfigurable Computing Circuit (DRC²) concept based on memory architecture for data-intensive (imaging, …) and secure (cryptography, …) applications. The proposed computing circuit is based on a 10-Transistor (10T) 3-Port SRAM bitcell array driven by a peripheral circuitry enabling all basic operations that can be traditionally performed by an ALU. As a result, logic and arithmetic operations can be entirely executed within the memory unit leading to a significant reduction in power consumption related to the data transfer between memories and computing units. Moreover, the proposed computing circuit can perform extremely-parallel operations enabling the processing of large volume of data. A test case based on image processing application and using the saturating increment function is analytically modeled to compare conventional and DRC²-based approaches. It is demonstrated that DRC²-based approach provides a reduction of clock cycle number of up to 2x. Finally, potential applications and must-be-considered changes at different design levels are discussed

    X-SRAM: Enabling In-Memory Boolean Computations in CMOS Static Random Access Memories

    Get PDF
    Silicon-based Static Random Access Memories (SRAM) and digital Boolean logic have been the workhorse of the state-of-art computing platforms. Despite tremendous strides in scaling the ubiquitous metal-oxide-semiconductor transistor, the underlying \textit{von-Neumann} computing architecture has remained unchanged. The limited throughput and energy-efficiency of the state-of-art computing systems, to a large extent, results from the well-known \textit{von-Neumann bottleneck}. The energy and throughput inefficiency of the von-Neumann machines have been accentuated in recent times due to the present emphasis on data-intensive applications like artificial intelligence, machine learning \textit{etc}. A possible approach towards mitigating the overhead associated with the von-Neumann bottleneck is to enable \textit{in-memory} Boolean computations. In this manuscript, we present an augmented version of the conventional SRAM bit-cells, called \textit{the X-SRAM}, with the ability to perform in-memory, vector Boolean computations, in addition to the usual memory storage operations. We propose at least six different schemes for enabling in-memory vector computations including NAND, NOR, IMP (implication), XOR logic gates with respect to different bit-cell topologies - the 8T cell and the 8+^+T Differential cell. In addition, we also present a novel \textit{`read-compute-store'} scheme, wherein the computed Boolean function can be directly stored in the memory without the need of latching the data and carrying out a subsequent write operation. The feasibility of the proposed schemes has been verified using predictive transistor models and Monte-Carlo variation analysis.Comment: This article has been accepted in a future issue of IEEE Transactions on Circuits and Systems-I: Regular Paper

    An Optical Content Addressable Memory Cell for Address Look-Up at 10 Gb/s

    Get PDF

    High Performance Pre-computation based Self-Controlled Precharge-Free Content-Addressable Memory

    Get PDF
    Content-addressable memory (CAM) is a special type of memory used in networking applications for very-high-speed searching operation. It compares input search data with the table of stored data, and returns the address of matching data in a parallel search method. Also the use of parallel comparison results in reduced search time, it also significantly increases power consumption when compared to precharge based CAM. The low-power NAND-type and high-speed NOR-type CAM methods require the precharge prior to the search. This PF phase leads to increase the settling time of the output and also reduce the speed of the search operation. In this paper, a High performance Pre-computation Based Self-Controlled Precharge-Free CAM (PB-SCPF CAM) structure is proposed for high-speed applications which reduce the settling time as well as improve the speed of the search. Where search time is very important for designing larger word lengths, SCPF architecture is efficacious in applications. The experimental results show that PB-SCPF approach can attain on average 32% in power reduction and 80% in delay reduction. The most important contribution of this project is that it offers theoretical and practical proofs to verify that our suggested PB-SCPF CAM system can achieve greater power reduction without the requirement of special CAM cell design. This shows that the approach which we have used is more flexible and adaptive for general designs and high speed applications

    BLADE: A BitLine Accelerator for Devices on the Edge

    Get PDF
    The increasing ubiquity of edge devices in the consumer market, along with their ever more computationally expensive workloads, necessitate corresponding increases in computing power to support such workloads. In-memory computing is attractive in edge devices as it reuses preexisting memory elements, thus limiting area overhead. Additionally, in-SRAM Computing (iSC) efficiently performs computations on spatially local data found in a variety of emerging edge device workloads. We therefore propose, implement, and benchmark BLADE, a BitLine Accelerator for Devices on the Edge. BLADE is an iSC architecture that can perform massive SIMD-like complex operations on hundreds to thousands of operands simultaneously. We implement BLADE in 28nm CMOS and demonstrate its functionality down to 0.6V, lower than any conventional state-of-the-art iSC architecture. We also benchmark BLADE in conjunction with a full Linux software stack in the gem5 architectural simulator, providing a robust demonstration of its performance gain in comparison to an equivalent embedded processor equipped with a NEON SIMD co-processor. We benchmark BLADE with three emerging edge device workloads, namely cryptography, high efficiency video coding, and convolutional neural networks, and demonstrate 4x, 6x, and 3x performance improvement, respectively, in comparison to a baseline CPU/NEON processor at an equivalent power budget

    Custom Memory Design for Logic-in-Memory: Drawbacks and Improvements over Conventional Memories

    Get PDF
    The speed of modern digital systems is severely limited by memory latency (the “Memory Wall” problem). Data exchange between Logic and Memory is also responsible for a large part of the system energy consumption. Logic-in-Memory (LiM) represents an attractive solution to this problem. By performing part of the computations directly inside the memory the system speed can be improved while reducing its energy consumption. LiM solutions that offer the major boost in performance are based on the modification of the memory cell. However, what is the cost of such modifications? How do these impact the memory array performance? In this work, this question is addressed by analysing a LiM memory array implementing an algorithm for the maximum/minimum value computation. The memory array is designed at physical level using the FreePDK 45nm CMOS process, with three memory cell variants, and its performance is compared to SRAM and CAM memories. Results highlight that read and write operations performance is worsened but in-memory operations result to be very efficient: a 55.26% reduction in the energy-delay product is measured for the AND operation with respect to the SRAM read one. Therefore, the LiM approach represents a very promising solution for low-density and high-performance memories
    corecore