941 research outputs found
DNA-TEQ: An Adaptive Exponential Quantization of Tensors for DNN Inference
Quantization is commonly used in Deep Neural Networks (DNNs) to reduce the
storage and computational complexity by decreasing the arithmetical precision
of activations and weights, a.k.a. tensors. Efficient hardware architectures
employ linear quantization to enable the deployment of recent DNNs onto
embedded systems and mobile devices. However, linear uniform quantization
cannot usually reduce the numerical precision to less than 8 bits without
sacrificing high performance in terms of model accuracy. The performance loss
is due to the fact that tensors do not follow uniform distributions. In this
paper, we show that a significant amount of tensors fit into an exponential
distribution. Then, we propose DNA-TEQ to exponentially quantize DNN tensors
with an adaptive scheme that achieves the best trade-off between numerical
precision and accuracy loss. The experimental results show that DNA-TEQ
provides a much lower quantization bit-width compared to previous proposals,
resulting in an average compression ratio of 40% over the linear INT8 baseline,
with negligible accuracy loss and without retraining the DNNs. Besides, DNA-TEQ
leads the way in performing dot-product operations in the exponential domain,
which saves 66% of energy consumption on average for a set of widely used DNNs.Comment: 8 pages, 8 figures, 5 table
Locality of temperature in spin chains
In traditional thermodynamics, temperature is a local quantity: a subsystem
of a large thermal system is in a thermal state at the same temperature as the
original system. For strongly interacting systems, however, the locality of
temperature breaks down. We study the possibility of associating an effective
thermal state to subsystems of infinite chains of interacting spin particles of
arbitrary finite dimension. We study the effect of correlations and criticality
in the definition of this effective thermal state and discuss the possible
implications for the classical simulation of thermal quantum systems.Comment: 18+9 pages, 12 figure
An Energy-Efficient Near-Data Processing Accelerator for DNNs that Optimizes Data Accesses
The constant growth of DNNs makes them challenging to implement and run
efficiently on traditional compute-centric architectures. Some accelerators
have attempted to add more compute units and on-chip buffers to solve the
memory wall problem without much success, and sometimes even worsening the
issue since more compute units also require higher memory bandwidth. Prior
works have proposed the design of memory-centric architectures based on the
Near-Data Processing (NDP) paradigm. NDP seeks to break the memory wall by
moving the computations closer to the memory hierarchy, reducing the data
movements and their cost as much as possible. The 3D-stacked memory is
especially appealing for DNN accelerators due to its high-density/low-energy
storage and near-memory computation capabilities to perform the DNN operations
massively in parallel. However, memory accesses remain as the main bottleneck
for running modern DNNs efficiently.
To improve the efficiency of DNN inference we present QeiHaN, a hardware
accelerator that implements a 3D-stacked memory-centric weight storage scheme
to take advantage of a logarithmic quantization of activations. In particular,
since activations of FC and CONV layers of modern DNNs are commonly represented
as powers of two with negative exponents, QeiHaN performs an implicit in-memory
bit-shifting of the DNN weights to reduce memory activity. Only the meaningful
bits of the weights required for the bit-shift operation are accessed. Overall,
QeiHaN reduces memory accesses by 25\% compared to a standard memory
organization. We evaluate QeiHaN on a popular set of DNNs. On average, QeiHaN
provides speedup and energy savings over a Neurocube-like
accelerator
ReDy: A Novel ReRAM-centric Dynamic Quantization Approach for Energy-efficient CNN Inference
The primary operation in DNNs is the dot product of quantized input
activations and weights. Prior works have proposed the design of memory-centric
architectures based on the Processing-In-Memory (PIM) paradigm. Resistive RAM
(ReRAM) technology is especially appealing for PIM-based DNN accelerators due
to its high density to store weights, low leakage energy, low read latency, and
high performance capabilities to perform the DNN dot-products massively in
parallel within the ReRAM crossbars. However, the main bottleneck of these
architectures is the energy-hungry analog-to-digital conversions (ADCs)
required to perform analog computations in-ReRAM, which penalizes the
efficiency and performance benefits of PIM. To improve energy-efficiency of
in-ReRAM analog dot-product computations we present ReDy, a hardware
accelerator that implements a ReRAM-centric Dynamic quantization scheme to take
advantage of the bit serial streaming and processing of activations. The energy
consumption of ReRAM-based DNN accelerators is directly proportional to the
numerical precision of the input activations of each DNN layer. In particular,
ReDy exploits that activations of CONV layers from Convolutional Neural
Networks (CNNs), a subset of DNNs, are commonly grouped according to the size
of their filters and the size of the ReRAM crossbars. Then, ReDy quantizes
on-the-fly each group of activations with a different numerical precision based
on a novel heuristic that takes into account the statistical distribution of
each group. Overall, ReDy greatly reduces the activity of the ReRAM crossbars
and the number of A/D conversions compared to an static 8-bit uniform
quantization. We evaluate ReDy on a popular set of modern CNNs. On average,
ReDy provides 13\% energy savings over an ISAAC-like accelerator with
negligible accuracy loss and area overhead.Comment: 13 pages, 16 figures, 4 Table
- …