100 research outputs found
Energy Efficient Learning with Low Resolution Stochastic Domain Wall Synapse Based Deep Neural Networks
We demonstrate that extremely low resolution quantized (nominally 5-state)
synapses with large stochastic variations in Domain Wall (DW) position can be
both energy efficient and achieve reasonably high testing accuracies compared
to Deep Neural Networks (DNNs) of similar sizes using floating precision
synaptic weights. Specifically, voltage controlled DW devices demonstrate
stochastic behavior as modeled rigorously with micromagnetic simulations and
can only encode limited states; however, they can be extremely energy efficient
during both training and inference. We show that by implementing suitable
modifications to the learning algorithms, we can address the stochastic
behavior as well as mitigate the effect of their low-resolution to achieve high
testing accuracies. In this study, we propose both in-situ and ex-situ training
algorithms, based on modification of the algorithm proposed by Hubara et al.
[1] which works well with quantization of synaptic weights. We train several
5-layer DNNs on MNIST dataset using 2-, 3- and 5-state DW device as synapse.
For in-situ training, a separate high precision memory unit is adopted to
preserve and accumulate the weight gradients, which are then quantized to
program the low precision DW devices. Moreover, a sizeable noise tolerance
margin is used during the training to address the intrinsic programming noise.
For ex-situ training, a precursor DNN is first trained based on the
characterized DW device model and a noise tolerance margin, which is similar to
the in-situ training. Remarkably, for in-situ inference the energy dissipation
to program the devices is only 13 pJ per inference given that the training is
performed over the entire MNIST dataset for 10 epochs
Skyrmion Logic-In-Memory Architecture for Maximum/Minimum Search
In modern computing systems there is the need to utilize a large amount of data in maintaining high efficiency. Limited memory bandwidth, coupled with the performance gap between memory and logic, impacts heavily on algorithms performance, increasing the overall time and energy required for computation. A possible approach to overcome such limitations is Logic-In-Memory (LIM). In this paper, we propose a LIM architecture based on a non-volatile skyrmion-based recetrack memory. The architecture can be used as a memory or can perform advanced logic functions on the stored data, for example searching for the maximum/minimum number. The circuit has been designed and validated using physical simulations for the memory array together with digital design tools for the control logic. The results highlight the small area of the proposed architecture and its good energy efficiency compared with a reference CMOS implementation
PIRM: Processing In Racetrack Memories
The growth in data needs of modern applications has created significant
challenges for modern systems leading a "memory wall." Spintronic Domain Wall
Memory (DWM), related to Spin-Transfer Torque Memory (STT-MRAM), provides
near-SRAM read/write performance, energy savings and nonvolatility, potential
for extremely high storage density, and does not have significant endurance
limitations. However, DWM's benefits cannot address data access latency and
throughput limitations of memory bus bandwidth. We propose PIRM, a DWM-based
in-memory computing solution that leverages the properties of DWM nanowires and
allows them to serve as polymorphic gates. While normally DWM is accessed by
applying spin polarized currents orthogonal to the nanowire at access points to
read individual bits, transverse access along the DWM nanowire allows the
differentiation of the aggregate resistance of multiple bits in the nanowire,
akin to a multilevel cell. PIRM leverages this transverse reading to directly
provide bulk-bitwise logic of multiple adjacent operands in the nanowire,
simultaneously. Based on this in-memory logic, PIRM provides a technique to
conduct multi-operand addition and two operand multiplication using transverse
access. PIRM provides a 1.6x speedup compared to the leading DRAM PIM technique
for query applications that leverage bulk bitwise operations. Compared to the
leading PIM technique for DWM, PIRM improves performance by 6.9x, 2.3x and
energy by 5.5x, 3.4x for 8-bit addition and multiplication, respectively. For
arithmetic heavy benchmarks, PIRM reduces access latency by 2.1x, while
decreasing energy consumption by 25.2x for a reasonable 10% area overhead
versus non-PIM DWM.Comment: This paper is accepted to the IEEE/ACM Symposium on
Microarchitecture, October 2022 under the title "CORUSCANT: Fast Efficient
Processing-in-Racetrack Memories
RISC-Vlim, a RISC-V Framework for Logic-in-Memory Architectures
Most modern CPU architectures are based on the von Neumann principle, where memory and processing units are separate entities. Although processing unit performance has improved over the years, memory capacity has not followed the same trend, creating a performance gap between them. This problem is known as the "memory wall" and severely limits the performance of a microprocessor. One of the most promising solutions is the "logic-in-memory" approach. It consists of merging memory and logic units, enabling data to be processed directly inside the memory itself. Here we propose an RISC-V framework that supports logic-in-memory operations. We substitute data memory with a circuit capable of storing data and of performing in-memory computation. The framework is based on a standard memory interface, so different logic-in-memory architectures can be inserted inside the microprocessor, based both on CMOS and emerging technologies. The main advantage of this framework is the possibility of comparing the performance of different logic-in-memory solutions on code execution. We demonstrate the effectiveness of the framework using a CMOS volatile memory and a memory based on a new emerging technology, racetrack logic. The results demonstrate an improvement in algorithm execution speed and a reduction in energy consumption
Energy Efficient Spintronic Device for Neuromorphic Computation
Future computing will require significant development in new computing device paradigms. This is motivated by CMOS devices reaching their technological limits, the need for non-Von Neumann architectures as well as the energy constraints of wearable technologies and embedded processors. The first device proposal, an energy-efficient voltage-controlled domain wall device for implementing an artificial neuron and synapse is analyzed using micromagnetic modeling. By controlling the domain wall motion utilizing spin transfer or spin orbit torques in association with voltage generated strain control of perpendicular magnetic anisotropy in the presence of Dzyaloshinskii-Moriya interaction (DMI), different positions of the domain wall are realized in the free layer of a magnetic tunnel junction to program different synaptic weights. Additionally, an artificial neuron can be realized by combining this DW device with a CMOS buffer. The second neuromorphic device proposal is inspired by the brain. Membrane potential of many neurons oscillate in a subthreshold damped fashion and fire when excited by an input frequency that nearly equals their Eigen frequency. We investigate theoretical implementation of such “resonate-and-fire” neurons by utilizing the magnetization dynamics of a fixed magnetic skyrmion based free layer of a magnetic tunnel junction (MTJ). Voltage control of magnetic anisotropy or voltage generated strain results in expansion and shrinking of a skyrmion core that mimics the subthreshold oscillation. Finally, we show that such resonate and fire neurons have potential application in coupled nanomagnetic oscillator based associative memory arrays
ReDy: A Novel ReRAM-centric Dynamic Quantization Approach for Energy-efficient CNN Inference
The primary operation in DNNs is the dot product of quantized input
activations and weights. Prior works have proposed the design of memory-centric
architectures based on the Processing-In-Memory (PIM) paradigm. Resistive RAM
(ReRAM) technology is especially appealing for PIM-based DNN accelerators due
to its high density to store weights, low leakage energy, low read latency, and
high performance capabilities to perform the DNN dot-products massively in
parallel within the ReRAM crossbars. However, the main bottleneck of these
architectures is the energy-hungry analog-to-digital conversions (ADCs)
required to perform analog computations in-ReRAM, which penalizes the
efficiency and performance benefits of PIM. To improve energy-efficiency of
in-ReRAM analog dot-product computations we present ReDy, a hardware
accelerator that implements a ReRAM-centric Dynamic quantization scheme to take
advantage of the bit serial streaming and processing of activations. The energy
consumption of ReRAM-based DNN accelerators is directly proportional to the
numerical precision of the input activations of each DNN layer. In particular,
ReDy exploits that activations of CONV layers from Convolutional Neural
Networks (CNNs), a subset of DNNs, are commonly grouped according to the size
of their filters and the size of the ReRAM crossbars. Then, ReDy quantizes
on-the-fly each group of activations with a different numerical precision based
on a novel heuristic that takes into account the statistical distribution of
each group. Overall, ReDy greatly reduces the activity of the ReRAM crossbars
and the number of A/D conversions compared to an static 8-bit uniform
quantization. We evaluate ReDy on a popular set of modern CNNs. On average,
ReDy provides 13\% energy savings over an ISAAC-like accelerator with
negligible accuracy loss and area overhead.Comment: 13 pages, 16 figures, 4 Table
- …