Search CORE

29 research outputs found

COHERENT/INCOHERENT MAGNETIZATION DYNAMICS OF NANOMAGNETIC DEVICES FOR ULTRA-LOW ENERGY COMPUTING

Author: Al-Rashid Md Mamun
Publication venue: VCU Scholars Compass
Publication date: 01/01/2017
Field of study

Nanomagnetic computing devices are inherently nonvolatile and show unique transfer characteristics while their switching energy requirements are on par, if not better than state of the art CMOS based devices. These characteristics make them very attractive for both Boolean and non-Boolean computing applications. Among different strategies employed to switch nanomagnetic computing devices e.g. magnetic field, spin transfer torque, spin orbit torque etc., strain induced switching has been shown to be among the most energy efficient. Strain switched nanomagnetic devices are also amenable for non-Boolean computing applications. Such strain mediated magnetization switching, termed here as “Straintronics”, is implemented by switching the magnetization of the magnetic layer of a magnetostrictive-piezoelectric nanoscale heterostructure by applying an electric field in the underlying piezoelectric layer. The modes of “straintronic” switching: coherent vs. incoherent switching of spins can affect device performance such as speed, energy dissipation and switching error in such devices. There was relatively little research performed on understanding the switching mechanism (coherent vs. incoherent) in xiv straintronic devices and their adaptation for non-Boolean computing, both of which have been studied in this thesis. Detailed studies of the effects of nanomagnet geometry and size on the coherence of the switching process and ultimately device performance of such strain switched nanomagnetic devices have been performed. These studies also contributed in optimizing designs for low energy, low dynamic error operation of straintronic logic devices and identified avenues for further research. A Novel non-Boolean “straintronic” computing device (Ternary Content Addressable Memory, abbreviated as TCAM) has been proposed and evaluated through numerical simulations. This device showed significant improvement over existing CMOS device based TCAM implementation in terms of scaling, energy-delay product, operational simplicity etc. The experimental part of this thesis answered a very fundamental question in strain induced magnetization rotation. Specifically, this experiment studied the variation in magnetization orientation for strain induced magnetization rotation along the thickness of a magnetostrictive thin film using polarized neutron reflectometry and demonstrated non-uniform magnetization rotation along the thickness of the sample. Additional experimental work was performed to lay the groundwork for ultra-low voltage straintronic switching demonstration. Preliminary sample fabrication and characterization that can potentially lead to low voltage (~10-100 mV) operation and local clocking of such devices has been performed

VCU Scholars Compass

FeFET Based Nonvolatile TCAM and DRAM Development

Author: Bayram Ismail
Publication venue
Publication date: 17/04/2018
Field of study

Ferroelectric Field Effect Transistor (FeFET) is a promising nonvolatile device which provides high integration density, fast programming speed, and excellent CMOS compatibility. In general, the non-volatility of FeFET is impacted by its physical structure and there is a trade-off between data retention time and device endurance. To improve the cell endurance, for example, the ferroelectric layer of FeFET needs to be programmed to a low polarization level, leading to a short retention time. In ferroelectric DRAM (FeDRAM) design, degradation in FeFET retention time and write-read disturbance requires the FeDRAM cells to be periodically refreshed in order to prevent data loss. In this work, I propose a novel adaptive refreshing and read voltage control scheme to minimize the energy overheads associated with FeDRAM refreshing while still achieve high cell access reliability. In addition to the DRAM application FeFET based TCAM memory is also studied. TCAM (ternary content addressable memory) is a special memory type that can compare input search data with stored data, and return location (sometime, the associated content) of matched data. TCAM is widely used in microprocessor designs as well as communication chip, e.g., IP-routing. Following technology advances of emerging nonvolatile memories (eNVM), applying eNVM to TCAM designs becomes attractive to achieve high density and low standby power. In this work, I examined the applications of three promising eNVM tech-nologies, i.e., magnetic tunneling junction (MTJ), memristor, and ferroelectric memory field effect transistor (FeMFET), in the design of nonvolatile TCAM cells. All these technologies can achieve close-to-zero standby power though each of them has very different pros and cons

D-Scholarship@Pitt

Fully-Binarized, Parallel, RRAM-based Computing Primitive for In-Memory Similarity Search

Author: Bricalli Alessandro
Kingra Sandeep Kaur
Molas Gabriel
Parmar Vivek
Piccolboni Giuseppe
Regev Amir
Suri Manan
Verma Deepak
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 18/09/2022
Field of study

In this work, we propose a fully-binarized XOR-based IMSS (In-Memory Similarity Search) using RRAM (Resistive Random Access Memory) arrays. XOR (Exclusive OR) operation is realized using 2T-2R bitcells arranged along the column in an array. This enables simultaneous match operation across multiple stored data vectors by performing analog column-wise XOR operation and summation to compute HD (Hamming Distance). The proposed scheme is experimentally validated on fabricated RRAM arrays. Full-system validation is performed through SPICE simulations using open source Skywater 130 nm CMOS PDK demonstrating energy of 17 fJ per XOR operation using the proposed bitcell with a full-system power dissipation of 145

\mu

W. Using projected estimations at advanced nodes (28 nm) energy savings of

\approx

1.5

\times

compared to the state-of-the-art can be observed for a fixed workload. Application-level validation is performed on HSI (Hyper-Spectral Image) pixel classification task using the Salinas dataset demonstrating an accuracy of 90%

arXiv.org e-Print Archive

In-memory computing with emerging memory devices: Status and outlook

Author: Cattaneo L.
Farronato M.
Glukhov A.
Ielmini D.
Lepri N.
Mannocci P.
Sun Z.
Publication venue
Publication date: 29/11/2022
Field of study

Supporting data for "In-memory computing with emerging memory devices: status and outlook", submitted to APL Machine Learning

Archivio istituzionale della ricerca - Politecnico di Milano

Directory of Open Access Journals

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

FAT: An In-Memory Accelerator with Fast Addition for Ternary Weight Neural Networks

Author: Chen Hui
Duong Luan H. K.
Liu Di
Liu Weichen
Zhu Shien
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/08/2022
Field of study

Convolutional Neural Networks (CNNs) demonstrate excellent performance in various applications but have high computational complexity. Quantization is applied to reduce the latency and storage cost of CNNs. Among the quantization methods, Binary and Ternary Weight Networks (BWNs and TWNs) have a unique advantage over 8-bit and 4-bit quantization. They replace the multiplication operations in CNNs with additions, which are favoured on In-Memory-Computing (IMC) devices. IMC acceleration for BWNs has been widely studied. However, though TWNs have higher accuracy and better sparsity than BWNs, IMC acceleration for TWNs has limited research. TWNs on existing IMC devices are inefficient because the sparsity is not well utilized, and the addition operation is not efficient. In this paper, we propose FAT as a novel IMC accelerator for TWNs. First, we propose a Sparse Addition Control Unit, which utilizes the sparsity of TWNs to skip the null operations on zero weights. Second, we propose a fast addition scheme based on the memory Sense Amplifier to avoid the time overhead of both carry propagation and writing back the carry to memory cells. Third, we further propose a Combined-Stationary data mapping to reduce the data movement of activations and weights and increase the parallelism across memory columns. Simulation results show that for addition operations at the Sense Amplifier level, FAT achieves 2.00X speedup, 1.22X power efficiency, and 1.22X area efficiency compared with a State-Of-The-Art IMC accelerator ParaPIM. FAT achieves 10.02X speedup and 12.19X energy efficiency compared with ParaPIM on networks with 80% average sparsity.Comment: 14 page

arXiv.org e-Print Archive

Recommended from our members

Physically Equivalent Intelligent Systems for Reasoning Under Uncertainty at Nanoscale

Author: Khasanvis Santosh
Publication venue: ScholarWorks@UMass Amherst
Publication date: 09/11/2015
Field of study

Machines today lack the inherent ability to reason and make decisions, or operate in the presence of uncertainty. Machine-learning methods such as Bayesian Networks (BNs) are widely acknowledged for their ability to uncover relationships and generate causal models for complex interactions. However, their massive computational requirement, when implemented on conventional computers, hinders their usefulness in many critical problem areas e.g., genetic basis of diseases, macro finance, text classification, environment monitoring, etc. We propose a new non-von Neumann technology framework purposefully architected across all layers for solving these problems efficiently through physical equivalence, enabled by emerging nanotechnology. The architecture builds on a probabilistic information representation and multi-domain mixed-signal circuit style, and is tightly coupled to a nanoscale physical layer that spans magnetic and electrical domains. Based on bottom-up device-circuit-architecture simulations, we show up to four orders of magnitude performance improvement (using computational resolution of 0.1) vs. best-of-breed multi-core machines with 100 processors, for BNs with about a million variables. Smaller problem sizes of ~100 variables can be realized at 20 mW power consumption and very low area around a few tenths of a mm2. Our vision is to enable solving complex Bayesian problems in real time, as well as enable intelligence capabilities at a small scale everywhere, ushering in a new era of machine intelligence

ScholarWorks@UMass Amherst

GenPIP: In-Memory Acceleration of Genome Analysis via Tight Integration of Basecalling and Read Mapping

Author: Alser Mohammed
Alserr Nour Almadhoun
Baranwal Akanksha
Cali Damla Senol
Firtina Can
Manglik Aditya
Mao Haiyu
Mutlu Onur
Sadrosadati Mohammad
Publication venue
Publication date: 18/09/2022
Field of study

Nanopore sequencing is a widely-used high-throughput genome sequencing technology that can sequence long fragments of a genome into raw electrical signals at low cost. Nanopore sequencing requires two computationally-costly processing steps for accurate downstream genome analysis. The first step, basecalling, translates the raw electrical signals into nucleotide bases (i.e., A, C, G, T). The second step, read mapping, finds the correct location of a read in a reference genome. In existing genome analysis pipelines, basecalling and read mapping are executed separately. We observe in this work that such separate execution of the two most time-consuming steps inherently leads to (1) significant data movement and (2) redundant computations on the data, slowing down the genome analysis pipeline. This paper proposes GenPIP, an in-memory genome analysis accelerator that tightly integrates basecalling and read mapping. GenPIP improves the performance of the genome analysis pipeline with two key mechanisms: (1) in-memory fine-grained collaborative execution of the major genome analysis steps in parallel; (2) a new technique for early-rejection of low-quality and unmapped reads to timely stop the execution of genome analysis for such reads, reducing inefficient computation. Our experiments show that, for the execution of the genome analysis pipeline, GenPIP provides 41.6X (8.4X) speedup and 32.8X (20.8X) energy savings with negligible accuracy loss compared to the state-of-the-art software genome analysis tools executed on a state-of-the-art CPU (GPU). Compared to a design that combines state-of-the-art in-memory basecalling and read mapping accelerators, GenPIP provides 1.39X speedup and 1.37X energy savings.Comment: 17 pages, 13 figure

arXiv.org e-Print Archive

Quantum and spin-based tunneling devices for memory systems

Author: Sudirgo Stephen
Publication venue: RIT Scholar Works
Publication date: 01/05/2006
Field of study

Rapid developments in information technology, such as internet, portable computing, and wireless communication, create a huge demand for fast and reliable ways to store and process information. Thus far, this need has been paralleled with the revolution in solid-state memory technologies. Memory devices, such as SRAM, DRAM, and flash, have been widely used in most electronic products. The primary strategy to keep up the trend is miniaturization. CMOS devices have been scaled down beyond sub-45 nm, the size of only a few atomic layers. Scaling, however, will soon reach the physical limitation of the material and cease to yield the desired enhancement in device performance. In this thesis, an alternative method to scaling is proposed and successfully realized. The proposed scheme integrates quantum devices, Si/SiGe resonant interband tunnel diodes (RITD), with classical CMOS devices forming a microsystem of disparate devices to achieve higher performance as well as higher density. The device/circuit designs, layouts and masks involving 12 levels were fabricated utilizing a process that incorporates nearly a hundred processing steps. Utilizing unique characteristics of each component, a low-power tunneling-based static random access memory (TSRAM) has been demonstrated. The TSRAM cells exhibit bistability operation with a power supply voltage as low as 0.37 V. Various TSRAM cells were also constructed and their latching mechanisms have been extensively investigated. In addition, the operation margins of TSRAM cells are evaluated based on different device structures and temperature variation from room temperature up to 200oC. The versatility of TSRAM is extended beyond the binary system. Using multi-peak Si/SiGe RITD, various multi-valued TSRAM (MV-TSRAM) configurations that can store more than two logic levels per cell are demonstrated. By this virtue, memory density can be substantially increased. Using two novel methods via ambipolar operation and utilization of enable/disable transistors, a six-valued MV-TSRAM cell are demonstrated. A revolutionary novel concept of integrating of Si/SiGe RITD with spin tunnel devices, magnetic tunnel junctions (MTJ), has been developed. This hybrid approach adds non-volatility and multi-valued memory potential as demonstrated by theoretical predictions and simulations. The challenges of physically fabricating these devices have been identified. These include process compatibility and device design. A test bed approach of fabricating RITD-MTJ structures has been developed. In conclusion, this body of work has created a sound foundation for new research frontiers in four different major areas: integrated TSRAM system, MV-TSRAM system, MTJ/RITD-based nonvolatile MRAM, and RITD/CMOS logic circuits

RIT Scholar Works

ENERGY-EFFICIENT AND SECURE HARDWARE FOR INTERNET OF THINGS (IoT) DEVICES

Author: Selvakumaran Dinesh Kumar
Publication venue: UKnowledge
Publication date: 01/01/2018
Field of study

Internet of Things (IoT) is a network of devices that are connected through the Internet to exchange the data for intelligent applications. Though IoT devices provide several advantages to improve the quality of life, they also present challenges related to security. The security issues related to IoT devices include leakage of information through Differential Power Analysis (DPA) based side channel attacks, authentication, piracy, etc. DPA is a type of side-channel attack where the attacker monitors the power consumption of the device to guess the secret key stored in it. There are several countermeasures to overcome DPA attacks. However, most of the existing countermeasures consume high power which makes them not suitable to implement in power constraint devices. IoT devices are battery operated, hence it is important to investigate the methods to design energy-efficient and secure IoT devices not susceptible to DPA attacks. In this research, we have explored the usefulness of a novel computing platform called adiabatic logic, low-leakage FinFET devices and Magnetic Tunnel Junction (MTJ) Logic-in-Memory (LiM) architecture to design energy-efficient and DPA secure hardware. Further, we have also explored the usefulness of adiabatic logic in the design of energy-efficient and reliable Physically Unclonable Function (PUF) circuits to overcome the authentication and piracy issues in IoT devices. Adiabatic logic is a low-power circuit design technique to design energy-efficient hardware. Adiabatic logic has reduced dynamic switching energy loss due to the recycling of charge to the power clock. As the first contribution of this dissertation, we have proposed a novel DPA-resistant adiabatic logic family called Energy-Efficient Secure Positive Feedback Adiabatic Logic (EE-SPFAL). EE-SPFAL based circuits are energy-efficient compared to the conventional CMOS based design because of recycling the charge after every clock cycle. Further, EE-SPFAL based circuits consume uniform power irrespective of input data transition which makes them resilience against DPA attacks. Scaling of CMOS transistors have served the industry for more than 50 years in providing integrated circuits that are denser, and cheaper along with its high performance, and low power. However, scaling of the transistors leads to increase in leakage current. Increase in leakage current reduces the energy-efficiency of the computing circuits,and increases their vulnerability to DPA attack. Hence, it is important to investigate the crypto circuits in low leakage devices such as FinFET to make them energy-efficient and DPA resistant. In this dissertation, we have proposed a novel FinFET based Secure Adiabatic Logic (FinSAL) family. FinSAL based designs utilize the low-leakage FinFET device along with adiabatic logic principles to improve energy-efficiency along with its resistance against DPA attack. Recently, Magnetic Tunnel Junction (MTJ)/CMOS based Logic-in-Memory (LiM) circuits have been explored to design low-power non-volatile hardware. Some of the advantages of MTJ device include non-volatility, near-zero leakage power, high integration density and easy compatibility with CMOS devices. However, the differences in power consumption between the switching of MTJ devices increase the vulnerability of Differential Power Analysis (DPA) based side-channel attack. Further, the MTJ/CMOS hybrid logic circuits which require frequent switching of MTJs are not very energy-efficient due to the significant energy required to switch the MTJ devices. In the third contribution of this dissertation, we have investigated a novel approach of building cryptographic hardware in MTJ/CMOS circuits using Look-Up Table (LUT) based method where the data stored in MTJs are constant during the entire encryption/decryption operation. Currently, high supply voltage is required in both writing and sensing operations of hybrid MTJ/CMOS based LiM circuits which consumes a considerable amount of energy. In order to meet the power budget in low-power devices, it is important to investigate the novel design techniques to design ultra-low-power MTJ/CMOS circuits. In the fourth contribution of this dissertation, we have proposed a novel energy-efficient Secure MTJ/CMOS Logic (SMCL) family. The proposed SMCL logic family consumes uniform power irrespective of data transition in MTJ and more energy-efficient compared to the state-of-art MTJ/ CMOS designs by using charge sharing technique. The other important contribution of this dissertation is the design of reliable Physical Unclonable Function (PUF). Physically Unclonable Function (PUF) are circuits which are used to generate secret keys to avoid the piracy and device authentication problems. However, existing PUFs consume high power and they suffer from the problem of generating unreliable bits. This dissertation have addressed this issue in PUFs by designing a novel adiabatic logic based PUF. The time ramp voltages in adiabatic PUF is utilized to improve the reliability of the PUF along with its energy-efficiency. Reliability of the adiabatic logic based PUF proposed in this dissertation is tested through simulation based temperature variations and supply voltage variations

University of Kentucky