436 research outputs found
RRAM variability and its mitigation schemes
Emerging technologies such as RRAMs are attracting significant attention due to their tempting characteristics such as high scalability, CMOS compatibility and non-volatility to replace the current conventional memories. However, critical causes of hardware reliability failures, such as process variation due to their nano-scale structure have gained considerable importance for acceptable memory yields. Such vulnerabilities make it essential to investigate new robust design strategies at the circuit system level. In this paper we have analyzed the RRAM variability phenomenon, its impact and variation tolerant techniques at the circuit level. Finally a variation-monitoring circuit is presented that discerns the reliable memory cells affected by process variability.Peer ReviewedPostprint (author's final draft
Configurable Operational Amplifier Architectures Based on Oxide Resistive RAMs
International audienceThis paper introduces memristor-based operational amplifiers in which semiconductor resistors are suppressed and replaced by memristors. The ability of the memristive elements to hold several resistance states is exploited to design programmable closed-loop operational amplifiers. An inverting operational amplifier, an integrator and a differentiator are studied. Such designs are developed based on a calibrated memristor model, and offer dynamic configurability to realize different gains and corner frequencies at reduced chip area
Public-Key Based Authentication Architecture for IoT Devices Using PUF
Nowadays, Internet of Things (IoT) is a trending topic in the computing
world. Notably, IoT devices have strict design requirements and are often
referred to as constrained devices. Therefore, security techniques and
primitives that are lightweight are more suitable for such devices, e.g.,
Static Random-Access Memory (SRAM) Physical Unclonable Functions (PUFs) and
Elliptic Curve Cryptography (ECC). SRAM PUF is an intrinsic security primitive
that is seeing widespread adoption in the IoT segment. ECC is a public-key
algorithm technique that has been gaining popularity among constrained IoT
devices. The popularity is due to using significantly smaller operands when
compared to other public-key techniques such as RSA (Rivest Shamir Adleman).
This paper shows the design, development, and evaluation of an
application-specific secure communication architecture based on SRAM PUF
technology and ECC for constrained IoT devices. More specifically, it
introduces an Elliptic Curve Diffie-Hellman (ECDH) public-key based
cryptographic protocol that utilizes PUF-derived keys as the root-of-trust for
silicon authentication. Also, it proposes a design of a modular hardware
architecture that supports the protocol. Finally, to analyze the practicality
as well as the feasibility of the proposed protocol, we demonstrate the
solution by prototyping and verifying a protocol variant on the commercial
Xilinx Zynq-7000 APSoC device
BCIM: Efficient Implementation of Binary Neural Network Based on Computation in Memory
Applications of Binary Neural Networks (BNNs) are promising for embedded systems with hard constraints on energy and computing power. Contrary to conventional neural networks using floating-point datatypes, BNNs use binarized weights and activations to reduce memory and computation requirements. Memristors, emerging non-volatile memory devices, show great potential as a target implementation platform for BNNs by integrating storage and compute units. However, the efficiency of this hardware highly depends on how the network is mapped and executed on these devices. In this paper, we propose an efficient implementation of XNOR-based BNN to maximize parallelization. In this implementation, costly analog-to-digital converters are replaced with sense amplifiers with custom reference(s) to generate activation values. Besides, a novel mapping is introduced to minimize the overhead of data communication between convolution layers mapped to different memristor crossbars. This comes with extensive analytical and simulation-based analysis to evaluate the implication of different design choices considering the accuracy of the network. The results show that our approach achieves up to 5 × energy-saving and 100 × improvement in latency compared to baselines
An In-Memory Architecture for High-Performance Long-Read Pre-Alignment Filtering
With the recent move towards sequencing of accurate long reads, finding
solutions that support efficient analysis of these reads becomes more
necessary. The long execution time required for sequence alignment of long
reads negatively affects genomic studies relying on sequence alignment.
Although pre-alignment filtering as an extra step before alignment was recently
introduced to mitigate sequence alignment for short reads, these filters do not
work as efficiently for long reads. Moreover, even with efficient pre-alignment
filters, the overall end-to-end (i.e., filtering + original alignment)
execution time of alignment for long reads remains high, while the filtering
step is now a major portion of the end-to-end execution time.
Our paper makes three contributions. First, it identifies data movement of
sequences between memory units and computing units as the main source of
inefficiency for pre-alignment filters of long reads. This is because although
filters reject many of these long sequencing pairs before they get to the
alignment stage, they still require a huge cost regarding time and energy
consumption for the large data transferred between memory and processor.
Second, this paper introduces an adaptation of a short-read pre-alignment
filtering algorithm suitable for long reads. We call this LongGeneGuardian.
Finally, it presents Filter-Fuse as an architecture that supports
LongGeneGuardian inside the memory. FilterFuse exploits the
Computation-In-Memory computing paradigm, eliminating the cost of data movement
in LongGeneGuardian.
Our evaluations show that FilterFuse improves the execution time of filtering
by 120.47x for long reads compared to State-of-the-Art (SoTA) filter,
SneakySnake. FilterFuse also improves the end-to-end execution time of sequence
alignment by up to 49.14x and 5207.63x compared to SneakySnake with SoTA
aligner and only SoTA aligner, respectively
High-Performance Data Mapping for BNNs on PCM-based Integrated Photonics
State-of-the-Art (SotA) hardware implementations of Deep Neural Networks
(DNNs) incur high latencies and costs. Binary Neural Networks (BNNs) are
potential alternative solutions to realize faster implementations without
losing accuracy. In this paper, we first present a new data mapping, called
TacitMap, suited for BNNs implemented based on a Computation-In-Memory (CIM)
architecture. TacitMap maximizes the use of available parallelism, while CIM
architecture eliminates the data movement overhead. We then propose a hardware
accelerator based on optical phase change memory (oPCM) called EinsteinBarrier.
Ein-steinBarrier incorporates TacitMap and adds an extra dimension for
parallelism through wavelength division multiplexing, leading to extra latency
reduction. The simulation results show that, compared to the SotA CIM baseline,
TacitMap and EinsteinBarrier significantly improve execution time by up to
~154x and ~3113x, respectively, while also maintaining the energy consumption
within 60% of that in the CIM baseline.Comment: To appear in Design Automation and Test in Europe (DATE), 202
- …
