1,972 research outputs found
Memory and information processing in neuromorphic systems
A striking difference between brain-inspired neuromorphic processors and
current von Neumann processors architectures is the way in which memory and
processing is organized. As Information and Communication Technologies continue
to address the need for increased computational power through the increase of
cores within a digital processor, neuromorphic engineers and scientists can
complement this need by building processor architectures where memory is
distributed with the processing. In this paper we present a survey of
brain-inspired processor architectures that support models of cortical networks
and deep neural networks. These architectures range from serial clocked
implementations of multi-neuron systems to massively parallel asynchronous ones
and from purely digital systems to mixed analog/digital systems which implement
more biological-like models of neurons and synapses together with a suite of
adaptation and learning mechanisms analogous to the ones found in biological
nervous systems. We describe the advantages of the different approaches being
pursued and present the challenges that need to be addressed for building
artificial neural processing systems that can display the richness of behaviors
seen in biological systems.Comment: Submitted to Proceedings of IEEE, review of recently proposed
neuromorphic computing platforms and system
Dynamic Power Management for Neuromorphic Many-Core Systems
This work presents a dynamic power management architecture for neuromorphic
many core systems such as SpiNNaker. A fast dynamic voltage and frequency
scaling (DVFS) technique is presented which allows the processing elements (PE)
to change their supply voltage and clock frequency individually and
autonomously within less than 100 ns. This is employed by the neuromorphic
simulation software flow, which defines the performance level (PL) of the PE
based on the actual workload within each simulation cycle. A test chip in 28 nm
SLP CMOS technology has been implemented. It includes 4 PEs which can be scaled
from 0.7 V to 1.0 V with frequencies from 125 MHz to 500 MHz at three distinct
PLs. By measurement of three neuromorphic benchmarks it is shown that the total
PE power consumption can be reduced by 75%, with 80% baseline power reduction
and a 50% reduction of energy per neuron and synapse computation, all while
maintaining temporary peak system performance to achieve biological real-time
operation of the system. A numerical model of this power management model is
derived which allows DVFS architecture exploration for neuromorphics. The
proposed technique is to be used for the second generation SpiNNaker
neuromorphic many core system
Hardware-efficient on-line learning through pipelined truncated-error backpropagation in binary-state networks
Artificial neural networks (ANNs) trained using backpropagation are powerful
learning architectures that have achieved state-of-the-art performance in
various benchmarks. Significant effort has been devoted to developing custom
silicon devices to accelerate inference in ANNs. Accelerating the training
phase, however, has attracted relatively little attention. In this paper, we
describe a hardware-efficient on-line learning technique for feedforward
multi-layer ANNs that is based on pipelined backpropagation. Learning is
performed in parallel with inference in the forward pass, removing the need for
an explicit backward pass and requiring no extra weight lookup. By using binary
state variables in the feedforward network and ternary errors in
truncated-error backpropagation, the need for any multiplications in the
forward and backward passes is removed, and memory requirements for the
pipelining are drastically reduced. Further reduction in addition operations
owing to the sparsity in the forward neural and backpropagating error signal
paths contributes to highly efficient hardware implementation. For
proof-of-concept validation, we demonstrate on-line learning of MNIST
handwritten digit classification on a Spartan 6 FPGA interfacing with an
external 1Gb DDR2 DRAM, that shows small degradation in test error performance
compared to an equivalently sized binary ANN trained off-line using standard
back-propagation and exact errors. Our results highlight an attractive synergy
between pipelined backpropagation and binary-state networks in substantially
reducing computation and memory requirements, making pipelined on-line learning
practical in deep networks.Comment: Now also consider 0/1 binary activations. Memory access statistics
reporte
AI/ML Algorithms and Applications in VLSI Design and Technology
An evident challenge ahead for the integrated circuit (IC) industry in the
nanometer regime is the investigation and development of methods that can
reduce the design complexity ensuing from growing process variations and
curtail the turnaround time of chip manufacturing. Conventional methodologies
employed for such tasks are largely manual; thus, time-consuming and
resource-intensive. In contrast, the unique learning strategies of artificial
intelligence (AI) provide numerous exciting automated approaches for handling
complex and data-intensive tasks in very-large-scale integration (VLSI) design
and testing. Employing AI and machine learning (ML) algorithms in VLSI design
and manufacturing reduces the time and effort for understanding and processing
the data within and across different abstraction levels via automated learning
algorithms. It, in turn, improves the IC yield and reduces the manufacturing
turnaround time. This paper thoroughly reviews the AI/ML automated approaches
introduced in the past towards VLSI design and manufacturing. Moreover, we
discuss the scope of AI/ML applications in the future at various abstraction
levels to revolutionize the field of VLSI design, aiming for high-speed, highly
intelligent, and efficient implementations
Intrinsically Evolvable Artificial Neural Networks
Dedicated hardware implementations of neural networks promise to provide faster, lower power operation when compared to software implementations executing on processors. Unfortunately, most custom hardware implementations do not support intrinsic training of these networks on-chip. The training is typically done using offline software simulations and the obtained network is synthesized and targeted to the hardware offline. The FPGA design presented here facilitates on-chip intrinsic training of artificial neural networks. Block-based neural networks (BbNN), the type of artificial neural networks implemented here, are grid-based networks neuron blocks. These networks are trained using genetic algorithms to simultaneously optimize the network structure and the internal synaptic parameters. The design supports online structure and parameter updates, and is an intrinsically evolvable BbNN platform supporting functional-level hardware evolution. Functional-level evolvable hardware (EHW) uses evolutionary algorithms to evolve interconnections and internal parameters of functional modules in reconfigurable computing systems such as FPGAs. Functional modules can be any hardware modules such as multipliers, adders, and trigonometric functions. In the implementation presented, the functional module is a neuron block. The designed platform is suitable for applications in dynamic environments, and can be adapted and retrained online. The online training capability has been demonstrated using a case study. A performance characterization model for RC implementations of BbNNs has also been presented
NullHop: A Flexible Convolutional Neural Network Accelerator Based on Sparse Representations of Feature Maps
Convolutional neural networks (CNNs) have become the dominant neural network
architecture for solving many state-of-the-art (SOA) visual processing tasks.
Even though Graphical Processing Units (GPUs) are most often used in training
and deploying CNNs, their power efficiency is less than 10 GOp/s/W for
single-frame runtime inference. We propose a flexible and efficient CNN
accelerator architecture called NullHop that implements SOA CNNs useful for
low-power and low-latency application scenarios. NullHop exploits the sparsity
of neuron activations in CNNs to accelerate the computation and reduce memory
requirements. The flexible architecture allows high utilization of available
computing resources across kernel sizes ranging from 1x1 to 7x7. NullHop can
process up to 128 input and 128 output feature maps per layer in a single pass.
We implemented the proposed architecture on a Xilinx Zynq FPGA platform and
present results showing how our implementation reduces external memory
transfers and compute time in five different CNNs ranging from small ones up to
the widely known large VGG16 and VGG19 CNNs. Post-synthesis simulations using
Mentor Modelsim in a 28nm process with a clock frequency of 500 MHz show that
the VGG19 network achieves over 450 GOp/s. By exploiting sparsity, NullHop
achieves an efficiency of 368%, maintains over 98% utilization of the MAC
units, and achieves a power efficiency of over 3TOp/s/W in a core area of
6.3mm. As further proof of NullHop's usability, we interfaced its FPGA
implementation with a neuromorphic event camera for real time interactive
demonstrations
- …