90 research outputs found
A Bit-Parallel Deterministic Stochastic Multiplier
This paper presents a novel bit-parallel deterministic stochastic multiplier,
which improves the area-energy-latency product by up to 10.610,
while improving the computational error by 32.2\%, compared to three prior
stochastic multipliers.Comment: To Appear at IEEE ISQED 202
Photonic Reconfigurable Accelerators for Efficient Inference of CNNs with Mixed-Sized Tensors
Photonic Microring Resonator (MRR) based hardware accelerators have been
shown to provide disruptive speedup and energy-efficiency improvements for
processing deep Convolutional Neural Networks (CNNs). However, previous
MRR-based CNN accelerators fail to provide efficient adaptability for CNNs with
mixed-sized tensors. One example of such CNNs is depthwise separable CNNs.
Performing inferences of CNNs with mixed-sized tensors on such inflexible
accelerators often leads to low hardware utilization, which diminishes the
achievable performance and energy efficiency from the accelerators. In this
paper, we present a novel way of introducing reconfigurability in the MRR-based
CNN accelerators, to enable dynamic maximization of the size compatibility
between the accelerator hardware components and the CNN tensors that are
processed using the hardware components. We classify the state-of-the-art
MRR-based CNN accelerators from prior works into two categories, based on the
layout and relative placements of the utilized hardware components in the
accelerators. We then use our method to introduce reconfigurability in
accelerators from these two classes, to consequently improve their parallelism,
the flexibility of efficiently mapping tensors of different sizes, speed, and
overall energy efficiency. We evaluate our reconfigurable accelerators against
three prior works for the area proportionate outlook (equal hardware area for
all accelerators). Our evaluation for the inference of four modern CNNs
indicates that our designed reconfigurable CNN accelerators provide
improvements of up to 1.8x in Frames-Per-Second (FPS) and up to 1.5x in FPS/W,
compared to an MRR-based accelerator from prior work.Comment: Paper accepted at CASES (ESWEEK) 202
AGNI: In-Situ, Iso-Latency Stochastic-to-Binary Number Conversion for In-DRAM Deep Learning
Recent years have seen a rapid increase in research activity in the field of
DRAM-based Processing-In-Memory (PIM) accelerators, where the analog computing
capability of DRAM is employed by minimally changing the inherent structure of
DRAM peripherals to accelerate various data-centric applications. Several
DRAM-based PIM accelerators for Convolutional Neural Networks (CNNs) have also
been reported. Among these, the accelerators leveraging in-DRAM stochastic
arithmetic have shown manifold improvements in processing latency and
throughput, due to the ability of stochastic arithmetic to convert
multiplications into simple bit-wise logical AND operations. However,the use of
in-DRAM stochastic arithmetic for CNN acceleration requires frequent stochastic
to binary number conversions. For that, prior works employ full adder-based or
serial counter based in-DRAM circuits. These circuits consume large area and
incur long latency. Their in-DRAM implementations also require heavy
modifications in DRAM peripherals, which significantly diminishes the benefits
of using stochastic arithmetic in these accelerators. To address these
shortcomings, this paper presents a new substrate for in-DRAM
stochastic-to-binary number conversion called AGNI. AGNI makes minor
modifications in DRAM peripherals using pass transistors, capacitors, encoders,
and charge pumps, and re-purposes the sense amplifiers as voltage comparators,
to enable in-situ binary conversion of input statistic operands of different
sizes with iso latency.Comment: (Preprint) To Appear at ISQED 202
- …