436 research outputs found

    RRAM variability and its mitigation schemes

    Get PDF
    Emerging technologies such as RRAMs are attracting significant attention due to their tempting characteristics such as high scalability, CMOS compatibility and non-volatility to replace the current conventional memories. However, critical causes of hardware reliability failures, such as process variation due to their nano-scale structure have gained considerable importance for acceptable memory yields. Such vulnerabilities make it essential to investigate new robust design strategies at the circuit system level. In this paper we have analyzed the RRAM variability phenomenon, its impact and variation tolerant techniques at the circuit level. Finally a variation-monitoring circuit is presented that discerns the reliable memory cells affected by process variability.Peer ReviewedPostprint (author's final draft

    Configurable Operational Amplifier Architectures Based on Oxide Resistive RAMs

    Get PDF
    International audienceThis paper introduces memristor-based operational amplifiers in which semiconductor resistors are suppressed and replaced by memristors. The ability of the memristive elements to hold several resistance states is exploited to design programmable closed-loop operational amplifiers. An inverting operational amplifier, an integrator and a differentiator are studied. Such designs are developed based on a calibrated memristor model, and offer dynamic configurability to realize different gains and corner frequencies at reduced chip area

    Public-Key Based Authentication Architecture for IoT Devices Using PUF

    Full text link
    Nowadays, Internet of Things (IoT) is a trending topic in the computing world. Notably, IoT devices have strict design requirements and are often referred to as constrained devices. Therefore, security techniques and primitives that are lightweight are more suitable for such devices, e.g., Static Random-Access Memory (SRAM) Physical Unclonable Functions (PUFs) and Elliptic Curve Cryptography (ECC). SRAM PUF is an intrinsic security primitive that is seeing widespread adoption in the IoT segment. ECC is a public-key algorithm technique that has been gaining popularity among constrained IoT devices. The popularity is due to using significantly smaller operands when compared to other public-key techniques such as RSA (Rivest Shamir Adleman). This paper shows the design, development, and evaluation of an application-specific secure communication architecture based on SRAM PUF technology and ECC for constrained IoT devices. More specifically, it introduces an Elliptic Curve Diffie-Hellman (ECDH) public-key based cryptographic protocol that utilizes PUF-derived keys as the root-of-trust for silicon authentication. Also, it proposes a design of a modular hardware architecture that supports the protocol. Finally, to analyze the practicality as well as the feasibility of the proposed protocol, we demonstrate the solution by prototyping and verifying a protocol variant on the commercial Xilinx Zynq-7000 APSoC device

    BCIM: Efficient Implementation of Binary Neural Network Based on Computation in Memory

    Get PDF
    Applications of Binary Neural Networks (BNNs) are promising for embedded systems with hard constraints on energy and computing power. Contrary to conventional neural networks using floating-point datatypes, BNNs use binarized weights and activations to reduce memory and computation requirements. Memristors, emerging non-volatile memory devices, show great potential as a target implementation platform for BNNs by integrating storage and compute units. However, the efficiency of this hardware highly depends on how the network is mapped and executed on these devices. In this paper, we propose an efficient implementation of XNOR-based BNN to maximize parallelization. In this implementation, costly analog-to-digital converters are replaced with sense amplifiers with custom reference(s) to generate activation values. Besides, a novel mapping is introduced to minimize the overhead of data communication between convolution layers mapped to different memristor crossbars. This comes with extensive analytical and simulation-based analysis to evaluate the implication of different design choices considering the accuracy of the network. The results show that our approach achieves up to 5 × energy-saving and 100 × improvement in latency compared to baselines

    An In-Memory Architecture for High-Performance Long-Read Pre-Alignment Filtering

    Full text link
    With the recent move towards sequencing of accurate long reads, finding solutions that support efficient analysis of these reads becomes more necessary. The long execution time required for sequence alignment of long reads negatively affects genomic studies relying on sequence alignment. Although pre-alignment filtering as an extra step before alignment was recently introduced to mitigate sequence alignment for short reads, these filters do not work as efficiently for long reads. Moreover, even with efficient pre-alignment filters, the overall end-to-end (i.e., filtering + original alignment) execution time of alignment for long reads remains high, while the filtering step is now a major portion of the end-to-end execution time. Our paper makes three contributions. First, it identifies data movement of sequences between memory units and computing units as the main source of inefficiency for pre-alignment filters of long reads. This is because although filters reject many of these long sequencing pairs before they get to the alignment stage, they still require a huge cost regarding time and energy consumption for the large data transferred between memory and processor. Second, this paper introduces an adaptation of a short-read pre-alignment filtering algorithm suitable for long reads. We call this LongGeneGuardian. Finally, it presents Filter-Fuse as an architecture that supports LongGeneGuardian inside the memory. FilterFuse exploits the Computation-In-Memory computing paradigm, eliminating the cost of data movement in LongGeneGuardian. Our evaluations show that FilterFuse improves the execution time of filtering by 120.47x for long reads compared to State-of-the-Art (SoTA) filter, SneakySnake. FilterFuse also improves the end-to-end execution time of sequence alignment by up to 49.14x and 5207.63x compared to SneakySnake with SoTA aligner and only SoTA aligner, respectively

    High-Performance Data Mapping for BNNs on PCM-based Integrated Photonics

    Full text link
    State-of-the-Art (SotA) hardware implementations of Deep Neural Networks (DNNs) incur high latencies and costs. Binary Neural Networks (BNNs) are potential alternative solutions to realize faster implementations without losing accuracy. In this paper, we first present a new data mapping, called TacitMap, suited for BNNs implemented based on a Computation-In-Memory (CIM) architecture. TacitMap maximizes the use of available parallelism, while CIM architecture eliminates the data movement overhead. We then propose a hardware accelerator based on optical phase change memory (oPCM) called EinsteinBarrier. Ein-steinBarrier incorporates TacitMap and adds an extra dimension for parallelism through wavelength division multiplexing, leading to extra latency reduction. The simulation results show that, compared to the SotA CIM baseline, TacitMap and EinsteinBarrier significantly improve execution time by up to ~154x and ~3113x, respectively, while also maintaining the energy consumption within 60% of that in the CIM baseline.Comment: To appear in Design Automation and Test in Europe (DATE), 202
    corecore