140 research outputs found
FPSA: A Full System Stack Solution for Reconfigurable ReRAM-based NN Accelerator Architecture
Neural Network (NN) accelerators with emerging ReRAM (resistive random access
memory) technologies have been investigated as one of the promising solutions
to address the \textit{memory wall} challenge, due to the unique capability of
\textit{processing-in-memory} within ReRAM-crossbar-based processing elements
(PEs). However, the high efficiency and high density advantages of ReRAM have
not been fully utilized due to the huge communication demands among PEs and the
overhead of peripheral circuits.
In this paper, we propose a full system stack solution, composed of a
reconfigurable architecture design, Field Programmable Synapse Array (FPSA) and
its software system including neural synthesizer, temporal-to-spatial mapper,
and placement & routing. We highly leverage the software system to make the
hardware design compact and efficient. To satisfy the high-performance
communication demand, we optimize it with a reconfigurable routing architecture
and the placement & routing tool. To improve the computational density, we
greatly simplify the PE circuit with the spiking schema and then adopt neural
synthesizer to enable the high density computation-resources to support
different kinds of NN operations. In addition, we provide spiking memory blocks
(SMBs) and configurable logic blocks (CLBs) in hardware and leverage the
temporal-to-spatial mapper to utilize them to balance the storage and
computation requirements of NN. Owing to the end-to-end software system, we can
efficiently deploy existing deep neural networks to FPSA. Evaluations show
that, compared to one of state-of-the-art ReRAM-based NN accelerators, PRIME,
the computational density of FPSA improves by 31x; for representative NNs, its
inference performance can achieve up to 1000x speedup.Comment: Accepted by ASPLOS 201
AI and ML Accelerator Survey and Trends
This paper updates the survey of AI accelerators and processors from past
three years. This paper collects and summarizes the current commercial
accelerators that have been publicly announced with peak performance and power
consumption numbers. The performance and power values are plotted on a scatter
graph, and a number of dimensions and observations from the trends on this plot
are again discussed and analyzed. Two new trends plots based on accelerator
release dates are included in this year's paper, along with the additional
trends of some neuromorphic, photonic, and memristor-based inference
accelerators.Comment: 10 pages, 4 figures, 2022 IEEE High Performance Extreme Computing
(HPEC) Conference. arXiv admin note: substantial text overlap with
arXiv:2009.00993, arXiv:2109.0895
Concepts and Paradigms for Neuromorphic Programming
The value of neuromorphic computers depends crucially on our ability to
program them for relevant tasks. Currently, neuromorphic computers are mostly
limited to machine learning methods adapted from deep learning. However,
neuromorphic computers have potential far beyond deep learning if we can only
make use of their computational properties to harness their full power.
Neuromorphic programming will necessarily be different from conventional
programming, requiring a paradigm shift in how we think about programming in
general. The contributions of this paper are 1) a conceptual analysis of what
"programming" means in the context of neuromorphic computers and 2) an
exploration of existing programming paradigms that are promising yet overlooked
in neuromorphic computing. The goal is to expand the horizon of neuromorphic
programming methods, thereby allowing researchers to move beyond the shackles
of current methods and explore novel directions
FireFly v2: Advancing Hardware Support for High-Performance Spiking Neural Network with a Spatiotemporal FPGA Accelerator
Spiking Neural Networks (SNNs) are expected to be a promising alternative to
Artificial Neural Networks (ANNs) due to their strong biological
interpretability and high energy efficiency. Specialized SNN hardware offers
clear advantages over general-purpose devices in terms of power and
performance. However, there's still room to advance hardware support for
state-of-the-art (SOTA) SNN algorithms and improve computation and memory
efficiency. As a further step in supporting high-performance SNNs on
specialized hardware, we introduce FireFly v2, an FPGA SNN accelerator that can
address the issue of non-spike operation in current SOTA SNN algorithms, which
presents an obstacle in the end-to-end deployment onto existing SNN hardware.
To more effectively align with the SNN characteristics, we design a
spatiotemporal dataflow that allows four dimensions of parallelism and
eliminates the need for membrane potential storage, enabling on-the-fly spike
processing and spike generation. To further improve hardware acceleration
performance, we develop a high-performance spike computing engine as a backend
based on a systolic array operating at 500-600MHz. To the best of our
knowledge, FireFly v2 achieves the highest clock frequency among all FPGA-based
implementations. Furthermore, it stands as the first SNN accelerator capable of
supporting non-spike operations, which are commonly used in advanced SNN
algorithms. FireFly v2 has doubled the throughput and DSP efficiency when
compared to our previous version of FireFly and it exhibits 1.33 times the DSP
efficiency and 1.42 times the power efficiency compared to the current most
advanced FPGA accelerators
MAP-SNN: Mapping Spike Activities with Multiplicity, Adaptability, and Plasticity into Bio-Plausible Spiking Neural Networks
Spiking Neural Network (SNN) is considered more biologically realistic and
power-efficient as it imitates the fundamental mechanism of the human brain.
Recently, backpropagation (BP) based SNN learning algorithms that utilize deep
learning frameworks have achieved good performance. However,
bio-interpretability is partially neglected in those BP-based algorithms.
Toward bio-plausible BP-based SNNs, we consider three properties in modeling
spike activities: Multiplicity, Adaptability, and Plasticity (MAP). In terms of
multiplicity, we propose a Multiple-Spike Pattern (MSP) with multiple spike
transmission to strengthen model robustness in discrete time-iteration. To
realize adaptability, we adopt Spike Frequency Adaption (SFA) under MSP to
decrease spike activities for improved efficiency. For plasticity, we propose a
trainable convolutional synapse that models spike response current to enhance
the diversity of spiking neurons for temporal feature extraction. The proposed
SNN model achieves competitive performances on neuromorphic datasets: N-MNIST
and SHD. Furthermore, experimental results demonstrate that the proposed three
aspects are significant to iterative robustness, spike efficiency, and temporal
feature extraction capability of spike activities. In summary, this work
proposes a feasible scheme for bio-inspired spike activities with MAP, offering
a new neuromorphic perspective to embed biological characteristics into spiking
neural networks
NeuroFlow: A General Purpose Spiking Neural Network Simulation Platform using Customizable Processors
© 2016 Cheung, Schultz and Luk.NeuroFlow is a scalable spiking neural network simulation platform for off-the-shelf high performance computing systems using customizable hardware processors such as Field-Programmable Gate Arrays (FPGAs). Unlike multi-core processors and application-specific integrated circuits, the processor architecture of NeuroFlow can be redesigned and reconfigured to suit a particular simulation to deliver optimized performance, such as the degree of parallelism to employ. The compilation process supports using PyNN, a simulator-independent neural network description language, to configure the processor. NeuroFlow supports a number of commonly used current or conductance based neuronal models such as integrate-and-fire and Izhikevich models, and the spike-timing-dependent plasticity (STDP) rule for learning. A 6-FPGA system can simulate a network of up to ~600,000 neurons and can achieve a real-time performance of 400,000 neurons. Using one FPGA, NeuroFlow delivers a speedup of up to 33.6 times the speed of an 8-core processor, or 2.83 times the speed of GPU-based platforms. With high flexibility and throughput, NeuroFlow provides a viable environment for large-scale neural network simulation
- …