180 research outputs found
Memory and information processing in neuromorphic systems
A striking difference between brain-inspired neuromorphic processors and
current von Neumann processors architectures is the way in which memory and
processing is organized. As Information and Communication Technologies continue
to address the need for increased computational power through the increase of
cores within a digital processor, neuromorphic engineers and scientists can
complement this need by building processor architectures where memory is
distributed with the processing. In this paper we present a survey of
brain-inspired processor architectures that support models of cortical networks
and deep neural networks. These architectures range from serial clocked
implementations of multi-neuron systems to massively parallel asynchronous ones
and from purely digital systems to mixed analog/digital systems which implement
more biological-like models of neurons and synapses together with a suite of
adaptation and learning mechanisms analogous to the ones found in biological
nervous systems. We describe the advantages of the different approaches being
pursued and present the challenges that need to be addressed for building
artificial neural processing systems that can display the richness of behaviors
seen in biological systems.Comment: Submitted to Proceedings of IEEE, review of recently proposed
neuromorphic computing platforms and system
Neuromorphic deep convolutional neural network learning systems for FPGA in real time
Deep Learning algorithms have become one of the best approaches for pattern recognition in several fields, including computer vision, speech recognition, natural language processing, and audio recognition, among others. In image vision, convolutional neural networks stand out, due to their relatively simple supervised training and their efficiency extracting features from a scene. Nowadays, there exist several implementations of convolutional neural networks accelerators that manage to perform these networks in real time. However, the number of operations and power consumption of these implementations can be reduced using a different processing paradigm as neuromorphic engineering.
Neuromorphic engineering field studies the behavior of biological and inner systems of the human neural processing with the purpose of design analog, digital or mixed-signal systems to solve problems inspired in how human brain performs complex tasks, replicating the behavior and properties of biological neurons. Neuromorphic engineering tries to give an answer to how our brain is capable to learn and perform complex tasks with high efficiency under the paradigm of spike-based computation.
This thesis explores both frame-based and spike-based processing paradigms for the development of hardware architectures for visual pattern recognition based on convolutional neural networks. In this work, two FPGA implementations of convolutional neural networks accelerator architectures for frame-based using OpenCL and SoC technologies are presented. Followed by a novel neuromorphic convolution processor for spike-based processing paradigm, which implements the same behaviour of leaky integrate-and-fire neuron model. Furthermore, it reads the data in rows being able to perform multiple layers in the same chip. Finally, a novel FPGA implementation of Hierarchy of Time Surfaces algorithm and a new memory model for spike-based systems are proposed
Learning from minimally labeled data with accelerated convolutional neural networks
The main objective of an Artificial Vision Algorithm is to design a mapping function that takes an image as an input and correctly classifies it into one of the user-determined categories. There are several important properties to be satisfied by the mapping function for visual understanding. First, the function should produce good representations of the visual world, which will be able to recognize images independently of pose, scale and illumination. Furthermore, the designed artificial vision system has to learn these representations by itself. Recent studies on Convolutional Neural Networks (ConvNets) produced promising advancements in visual understanding. These networks attain significant performance upgrades by relying on hierarchical structures inspired by biological vision systems. In my research, I work mainly in two areas: 1) how ConvNets can be programmed to learn the optimal mapping function using the minimum amount of labeled data, and 2) how these networks can be accelerated for practical purposes. In this work, algorithms that learn from unlabeled data are studied. A new framework that exploits unlabeled data is proposed. The proposed framework obtains state-of-the-art performance results in different tasks.
Furthermore, this study presents an optimized streaming method for ConvNets’ hardware accelerator on an embedded platform. It is tested on object classification and detection applications using ConvNets. Experimental results indicate high computational efficiency, and significant performance upgrades over all other existing platforms
ED-Scorbot: A Robotic test-bed Framework for FPGA-based Neuromorphic systems
Neuromorphic engineering is a growing and
promising discipline nowadays. Neuro-inspiration and
brain understanding applied to solve engineering
problems is boosting new architectures, solutions and
products today. The biological brain and neural systems
process information at relatively low speeds through
small components, called neurons, and it is impressive how
they connect each other to construct complex
architectures to solve in a quasi-instantaneous way
visual and audio processing tasks, object detection and
tracking, target approximation, grasping…, etc., with very
low power. Neuromorphs are beginning to be very promising
for a new era in the development of new sensors,
processors, robots and software systems that mimic
these biological systems. The event-driven Scorbot (EDScorbot)
is a robotic arm plus a set of FPGA / microcontroller’s
boards and a library of FPGA logic joined in a completely
event-based framework (spike-based) from the sensors to the
actuators. It is located in Seville (University of Seville) and
can be used remotely. Spike-based commands, through
neuro-inspired motor controllers, can be sent to the
robot after visual processing object detection and
tracking for grasping or manipulation, after complex
visual and audio-visual sensory fusion, or after performing
a learning task. Thanks to the cascade FPGA
architecture through the Address-Event-Representation
(AER) bus, supported by specialized boards, resources for
algorithms implementation are not limited.Ministerio de Economía y Competitividad TEC2012-37868-C04-02Junta de Andalucía P12-TIC-130
Neuromorphic LIF Row-by-Row Multiconvolution Processor for FPGA
Deep Learning algorithms have become state-of-theart
methods for multiple fields, including computer vision, speech
recognition, natural language processing, and audio recognition,
among others. In image vision, convolutional neural networks
(CNN) stand out. This kind of network is expensive in terms of
computational resources due to the large number of operations required
to process a frame. In recent years, several frame-based chip
solutions to deploy CNN for real time have been developed. Despite
the good results in power and accuracy given by these solutions, the
number of operations is still high, due the complexity of the current
network models. However, it is possible to reduce the number of
operations using different computer vision techniques other than
frame-based, e.g., neuromorphic event-based techniques. There exist
several neuromorphic vision sensors whose pixels detect changes
in luminosity. Inspired in the leaky integrate-and-fire (LIF) neuron,
we propose in this manuscript an event-based field-programmable
gate array (FPGA) multiconvolution system. Its main novelty is the
combination of a memory arbiter for efficient memory access to
allowrow-by-rowkernel processing. This system is able to convolve
64 filters across multiple kernel sizes, from 1 × 1 to 7 × 7, with
latencies of 1.3 μs and 9.01 μs, respectively, generating a continuous
flow of output events. The proposed architecture will easily fit
spike-based CNNs.Ministerio de Economía y Competitividad TEC2016-77785-
Algorithm and Architecture Co-design for High-performance Digital Signal Processing.
CMOS scaling has been the driving force behind the revolution of digital signal processing (DSP) systems, but scaling is slowing down and the CMOS device is approaching its fundamental scaling limit. At the same time, DSP algorithms are continuing to evolve, so there is a growing gap between the increasing complexities of the algorithms and what is practically implementable. The gap can be bridged by exploring the synergy between algorithm and hardware design, using the so-called co-design techniques.
In this thesis, algorithm and architecture co-design techniques are applied to X-ray computed tomography (CT) image reconstruction. Analysis of fixed-point quantization and CT geometry identifies an optimal word length and a mismatch between the object and projection grids. A water-filling buffer is designed to resolve the grid mismatch, and is combined with parallel fixed-point arithmetic units to improve the throughput. The analysis eventually leads to an out-of-order scheduling architecture that reduces the off-chip memory access by three orders of magnitude.
The co-design techniques are further applied to the design of neural networks for sparse coding. Analysis of the neuron spiking dynamics leads to the optimal tuning of network size, spiking rate, and update step size to keep the spiking sparse. The resulting sparsity enables a bus-ring architecture to achieve both high throughput and scalability. A 65nm CMOS chip implementing the architecture demonstrates feature extraction at a throughput of 1.24G pixel/s at 1.0V and 310MHz. The error tolerance of sparse coding can be exploited to enhance the energy efficiency.
As a natural next step after the sparse coding chip, a neural-inspired inference module (IM) is designed for object recognition. The object recognition chip consists of an IM based on sparse coding and an event-driven classifier. A learning co-processor is integrated on chip to enable on-chip learning. The throughput and energy efficiency are further improved using architectural techniques including sub-dividing the IM and classifier into modules and optimal pipelining. The result is a 65nm CMOS chip that performs sparse coding at 10.16G pixel/s at 1.0V and 635MHz.
The co-design techniques can be applied to the design of other advanced DSP algorithms for emerging applications.PhDElectrical Engineering: SystemsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/113344/1/jungkook_1.pd
- …