20 research outputs found
An AER handshake-less modular infrastructure PCB with x8 2.5Gbps LVDS serial links
Nowadays spike-based brain processing emulation is
taking off. Several EU and others worldwide projects are
demonstrating this, like SpiNNaker, BrainScaleS, FACETS, or
NeuroGrid. The larger the brain process emulation on silicon is,
the higher the communication performance of the hosting
platforms has to be. Many times the bottleneck of these system
implementations is not on the performance inside a chip or a
board, but in the communication between boards. This paper
describes a novel modular Address-Event-Representation (AER)
FPGA-based (Spartan6) infrastructure PCB (the AER-Node
board) with 2.5Gbps LVDS high speed serial links over SATA
cables that offers a peak performance of 32-bit 62.5Meps (Mega
events per second) on board-to-board communications. The
board allows back compatibility with parallel AER devices
supporting up to x2 28-bit parallel data with asynchronous
handshake. These boards also allow modular expansion
functionality through several daughter boards. The paper is
focused on describing in detail the LVDS serial interface and
presenting its performance.Ministerio de Ciencia e Innovación TEC2009-10639-C04-02/01Ministerio de Economía y Competitividad TEC2012-37868-C04-02/01Junta de Andalucía TIC-6091Ministerio de Economía y Competitividad PRI-PIMCHI-2011-076
A differential memristive synapse circuit for on-line learning in neuromorphic computing systems
Spike-based learning with memristive devices in neuromorphic computing
architectures typically uses learning circuits that require overlapping pulses
from pre- and post-synaptic nodes. This imposes severe constraints on the
length of the pulses transmitted in the network, and on the network's
throughput. Furthermore, most of these circuits do not decouple the currents
flowing through memristive devices from the one stimulating the target neuron.
This can be a problem when using devices with high conductance values, because
of the resulting large currents. In this paper we propose a novel circuit that
decouples the current produced by the memristive device from the one used to
stimulate the post-synaptic neuron, by using a novel differential scheme based
on the Gilbert normalizer circuit. We show how this circuit is useful for
reducing the effect of variability in the memristive devices, and how it is
ideally suited for spike-based learning mechanisms that do not require
overlapping pre- and post-synaptic pulses. We demonstrate the features of the
proposed synapse circuit with SPICE simulations, and validate its learning
properties with high-level behavioral network simulations which use a
stochastic gradient descent learning rule in two classification tasks.Comment: 18 Pages main text, 9 pages of supplementary text, 19 figures.
Patente
Recommended from our members
Efficient spiking neural network model of pattern motion selectivity in visual cortex
Simulating large-scale models of biological motion perception is challenging, due to the required memory to store the network structure and the computational power needed to quickly solve the neuronal dynamics. A low-cost yet high-performance approach to simulating large-scale neural network models in real-time is to leverage the parallel processing capability of graphics processing units (GPUs). Based on this approach, we present a two-stage model of visual area MT that we believe to be the first large-scale spiking network to demonstrate pattern direction selectivity. In this model, component-direction- selective (CDS) cells in MT linearly combine inputs from V1 cells that have spatiotemporal receptive fields according to the motion energy model of Simoncelli and Heeger. Pattern-direction-selective (PDS) cells in MT are constructed by pooling over MT CDS cells with a wide range of preferred directions. Responses of our model neurons are comparable to electrophysiological results for grating and plaid stimuli as well as speed tuning. The behavioral response of the network in a motion discrimination task is in agreement with psychophysical data. Moreover, our implementation outperforms a previous implementation of the motion energy model by orders of magnitude in terms of computational speed and memory usage. The full network, which comprises 153,216 neurons and approximately 40 million synapses, processes 20 frames per second of a 40∈×∈40 input video in real-time using a single off-the-shelf GPU. To promote the use of this algorithm among neuroscientists and computer vision researchers, the source code for the simulator, the network, and analysis scripts are publicly available. © 2014 Springer Science+Business Media New York
FPGA ACCELERATION OF A CORTICAL AND A MATCHED FILTER-BASED ALGORITHM
Digital image processing is a widely used and diverse field. It is used in a broad array of areas such as tracking and detection, object avoidance, computer vision, and numerous other applications. For many image processing tasks, the computations can become time consuming. Therefore, a means for accelerating the computations would be beneficial. Using that as motivation, this thesis examines the acceleration of two distinctly different image processing applications. The first image processing application examined is a recent neocortex inspired cognitive model geared towards pattern recognition as seen in the visual cortex. For this model, both software and reconfigurable logic based FPGA implementations of the model are examined on a Cray XD1. Results indicate that hardware-acceleration can provide average throughput gains of 75 times over software-only implementations of the networks examined when utilizing the full resources of the Cray XD1. The second image processing application examined is matched filter-based position detection. This approach is at the heart of the automatic alignment algorithm currently being tested in the National Ignition Faculty presently under construction at the Lawrence Livermore National Laboratory. To reduce the processing time of the match filtering, a reconfigurable logic architecture was developed. Results show that the reconfigurable logic architecture provides a speedup of approximately 253 times over an optimized software implementation
Neural Dynamics as Sampling: A Model for Stochastic Computation in Recurrent Networks of Spiking Neurons
The organization of computations in networks of spiking neurons in the brain is still largely unknown, in particular in view of the inherently stochastic features of their firing activity and the experimentally observed trial-to-trial variability of neural systems in the brain. In principle there exists a powerful computational framework for stochastic computations, probabilistic inference by sampling, which can explain a large number of macroscopic experimental data in neuroscience and cognitive science. But it has turned out to be surprisingly difficult to create a link between these abstract models for stochastic computations and more detailed models of the dynamics of networks of spiking neurons. Here we create such a link and show that under some conditions the stochastic firing activity of networks of spiking neurons can be interpreted as probabilistic inference via Markov chain Monte Carlo (MCMC) sampling. Since common methods for MCMC sampling in distributed systems, such as Gibbs sampling, are inconsistent with the dynamics of spiking neurons, we introduce a different approach based on non-reversible Markov chains that is able to reflect inherent temporal processes of spiking neuronal activity through a suitable choice of random variables. We propose a neural network model and show by a rigorous theoretical analysis that its neural activity implements MCMC sampling of a given distribution, both for the case of discrete and continuous time. This provides a step towards closing the gap between abstract functional models of cortical computation and more detailed models of networks of spiking neurons
Algorithm and Architecture Co-design for High-performance Digital Signal Processing.
CMOS scaling has been the driving force behind the revolution of digital signal processing (DSP) systems, but scaling is slowing down and the CMOS device is approaching its fundamental scaling limit. At the same time, DSP algorithms are continuing to evolve, so there is a growing gap between the increasing complexities of the algorithms and what is practically implementable. The gap can be bridged by exploring the synergy between algorithm and hardware design, using the so-called co-design techniques.
In this thesis, algorithm and architecture co-design techniques are applied to X-ray computed tomography (CT) image reconstruction. Analysis of fixed-point quantization and CT geometry identifies an optimal word length and a mismatch between the object and projection grids. A water-filling buffer is designed to resolve the grid mismatch, and is combined with parallel fixed-point arithmetic units to improve the throughput. The analysis eventually leads to an out-of-order scheduling architecture that reduces the off-chip memory access by three orders of magnitude.
The co-design techniques are further applied to the design of neural networks for sparse coding. Analysis of the neuron spiking dynamics leads to the optimal tuning of network size, spiking rate, and update step size to keep the spiking sparse. The resulting sparsity enables a bus-ring architecture to achieve both high throughput and scalability. A 65nm CMOS chip implementing the architecture demonstrates feature extraction at a throughput of 1.24G pixel/s at 1.0V and 310MHz. The error tolerance of sparse coding can be exploited to enhance the energy efficiency.
As a natural next step after the sparse coding chip, a neural-inspired inference module (IM) is designed for object recognition. The object recognition chip consists of an IM based on sparse coding and an event-driven classifier. A learning co-processor is integrated on chip to enable on-chip learning. The throughput and energy efficiency are further improved using architectural techniques including sub-dividing the IM and classifier into modules and optimal pipelining. The result is a 65nm CMOS chip that performs sparse coding at 10.16G pixel/s at 1.0V and 635MHz.
The co-design techniques can be applied to the design of other advanced DSP algorithms for emerging applications.PhDElectrical Engineering: SystemsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/113344/1/jungkook_1.pd