661 research outputs found
MorphIC: A 65-nm 738k-Synapse/mm Quad-Core Binary-Weight Digital Neuromorphic Processor with Stochastic Spike-Driven Online Learning
Recent trends in the field of neural network accelerators investigate weight
quantization as a means to increase the resource- and power-efficiency of
hardware devices. As full on-chip weight storage is necessary to avoid the high
energy cost of off-chip memory accesses, memory reduction requirements for
weight storage pushed toward the use of binary weights, which were demonstrated
to have a limited accuracy reduction on many applications when
quantization-aware training techniques are used. In parallel, spiking neural
network (SNN) architectures are explored to further reduce power when
processing sparse event-based data streams, while on-chip spike-based online
learning appears as a key feature for applications constrained in power and
resources during the training phase. However, designing power- and
area-efficient spiking neural networks still requires the development of
specific techniques in order to leverage on-chip online learning on binary
weights without compromising the synapse density. In this work, we demonstrate
MorphIC, a quad-core binary-weight digital neuromorphic processor embedding a
stochastic version of the spike-driven synaptic plasticity (S-SDSP) learning
rule and a hierarchical routing fabric for large-scale chip interconnection.
The MorphIC SNN processor embeds a total of 2k leaky integrate-and-fire (LIF)
neurons and more than two million plastic synapses for an active silicon area
of 2.86mm in 65nm CMOS, achieving a high density of 738k synapses/mm.
MorphIC demonstrates an order-of-magnitude improvement in the area-accuracy
tradeoff on the MNIST classification task compared to previously-proposed SNNs,
while having no penalty in the energy-accuracy tradeoff.Comment: This document is the paper as accepted for publication in the IEEE
Transactions on Biomedical Circuits and Systems journal (2019), the
fully-edited paper is available at
https://ieeexplore.ieee.org/document/876400
Computation using Noise-based Logic: Efficient String Verification over a Slow Communication Channel
Utilizing the hyperspace of noise-based logic, we show two string
verification methods with low communication complexity. One of them is based on
continuum noise-based logic. The other one utilizes noise-based logic with
random telegraph signals where a mathematical analysis of the error probability
is also given. The last operation can also be interpreted as computing
universal hash functions with noise-based logic and using them for string
comparison. To find out with 10^-25 error probability that two strings with
arbitrary length are different (this value is similar to the error probability
of an idealistic gate in today's computer) Alice and Bob need to compare only
83 bits of the noise-based hyperspace.Comment: Accepted for publication in European Journal of Physics B (November
10, 2010
The effect of heterogeneity on decorrelation mechanisms in spiking neural networks: a neuromorphic-hardware study
High-level brain function such as memory, classification or reasoning can be
realized by means of recurrent networks of simplified model neurons. Analog
neuromorphic hardware constitutes a fast and energy efficient substrate for the
implementation of such neural computing architectures in technical applications
and neuroscientific research. The functional performance of neural networks is
often critically dependent on the level of correlations in the neural activity.
In finite networks, correlations are typically inevitable due to shared
presynaptic input. Recent theoretical studies have shown that inhibitory
feedback, abundant in biological neural networks, can actively suppress these
shared-input correlations and thereby enable neurons to fire nearly
independently. For networks of spiking neurons, the decorrelating effect of
inhibitory feedback has so far been explicitly demonstrated only for
homogeneous networks of neurons with linear sub-threshold dynamics. Theory,
however, suggests that the effect is a general phenomenon, present in any
system with sufficient inhibitory feedback, irrespective of the details of the
network structure or the neuronal and synaptic properties. Here, we investigate
the effect of network heterogeneity on correlations in sparse, random networks
of inhibitory neurons with non-linear, conductance-based synapses. Emulations
of these networks on the analog neuromorphic hardware system Spikey allow us to
test the efficiency of decorrelation by inhibitory feedback in the presence of
hardware-specific heterogeneities. The configurability of the hardware
substrate enables us to modulate the extent of heterogeneity in a systematic
manner. We selectively study the effects of shared input and recurrent
connections on correlations in membrane potentials and spike trains. Our
results confirm ...Comment: 20 pages, 10 figures, supplement
Throughput-optimal systolic arrays from recurrence equations
Many compute-bound software kernels have seen order-of-magnitude speedups on special-purpose accelerators built on specialized architectures such as field-programmable gate arrays (FPGAs). These architectures are particularly good at implementing dynamic programming algorithms that can be expressed as systems of recurrence equations, which in turn can be realized as systolic array designs. To efficiently find good realizations of an algorithm for a given hardware platform, we pursue software tools that can search the space of possible parallel array designs to optimize various design criteria. Most existing design tools in this area produce a design that is latency-space optimal. However, we instead wish to target applications that operate on a large collection of small inputs, e.g. a database of biological sequences. For such applications, overall throughput rather than latency per input is the most important measure of performance. In this work, we introduce a new procedure to optimize throughput of a systolic array subject to resource constraints, in this case the area and bandwidth constraints of an FPGA device. We show that the throughput of an array is dependent on the maximum number of lattice points executed by any processor in the array, which to a close approximation is determined solely by the array’s projection vector. We describe a bounded search process to find throughput-optimal projection vectors and a tool to perform automated design space exploration, discovering a range of array designs that are optimal for inputs of different sizes. We apply our techniques to the Nussinov RNA folding algorithm to generate multiple mappings of this algorithm into systolic arrays. By combining our library of designs with run-time reconfiguration of an FPGA device to dynamically switch among them, we predict significant speedup over a single, latency-space optimal array
Second year technical report on-board processing for future satellite communications systems
Advanced baseband and microwave switching techniques for large domestic communications satellites operating in the 30/20 GHz frequency bands are discussed. The nominal baseband processor throughput is one million packets per second (1.6 Gb/s) from one thousand T1 carrier rate customer premises terminals. A frequency reuse factor of sixteen is assumed by using 16 spot antenna beams with the same 100 MHz bandwidth per beam and a modulation with a one b/s per Hz bandwidth efficiency. Eight of the beams are fixed on major metropolitan areas and eight are scanning beams which periodically cover the remainder of the U.S. under dynamic control. User signals are regenerated (demodulated/remodulated) and message packages are reformatted on board. Frequency division multiple access and time division multiplex are employed on the uplinks and downlinks, respectively, for terminals within the coverage area and dwell interval of a scanning beam. Link establishment and packet routing protocols are defined. Also described is a detailed design of a separate 100 x 100 microwave switch capable of handling nonregenerated signals occupying the remaining 2.4 GHz bandwidth with 60 dB of isolation, at an estimated weight and power consumption of approximately 400 kg and 100 W, respectively
Optoelectronic Reservoir Computing
Reservoir computing is a recently introduced, highly efficient bio-inspired
approach for processing time dependent data. The basic scheme of reservoir
computing consists of a non linear recurrent dynamical system coupled to a
single input layer and a single output layer. Within these constraints many
implementations are possible. Here we report an opto-electronic implementation
of reservoir computing based on a recently proposed architecture consisting of
a single non linear node and a delay line. Our implementation is sufficiently
fast for real time information processing. We illustrate its performance on
tasks of practical importance such as nonlinear channel equalization and speech
recognition, and obtain results comparable to state of the art digital
implementations.Comment: Contains main paper and two Supplementary Material
NASA SERC 1990 Symposium on VLSI Design
This document contains papers presented at the first annual NASA Symposium on VLSI Design. NASA's involvement in this event demonstrates a need for research and development in high performance computing. High performance computing addresses problems faced by the scientific and industrial communities. High performance computing is needed in: (1) real-time manipulation of large data sets; (2) advanced systems control of spacecraft; (3) digital data transmission, error correction, and image compression; and (4) expert system control of spacecraft. Clearly, a valuable technology in meeting these needs is Very Large Scale Integration (VLSI). This conference addresses the following issues in VLSI design: (1) system architectures; (2) electronics; (3) algorithms; and (4) CAD tools
Wideband CMOS Data Converters for Linear and Efficient mmWave Transmitters
With continuously increasing demands for wireless connectivity, higher\ua0carrier frequencies and wider bandwidths are explored. To overcome a limited transmit power at these higher carrier frequencies, multiple\ua0input multiple output (MIMO) systems, with a large number of transmitters\ua0and antennas, are used to direct the transmitted power towards\ua0the user. With a large transmitter count, each individual transmitter\ua0needs to be small and allow for tight integration with digital circuits. In\ua0addition, modern communication standards require linear transmitters,\ua0making linearity an important factor in the transmitter design.In this thesis, radio frequency digital-to-analog converter (RF-DAC)-based transmitters are explored. They shift the transition from digital\ua0to analog closer to the antennas, performing both digital-to-analog\ua0conversion and up-conversion in a single block. To reduce the need for\ua0computationally costly digital predistortion (DPD), a linear and wellbehaved\ua0RF-DAC transfer characteristic is desirable. The combination\ua0of non-overlapping local oscillator (LO) signals and an expanding segmented\ua0non-linear RF-DAC scaling is evaluated as a way to linearize\ua0the transmitter. This linearization concept has been studied both for\ua0the linearization of the RF-DAC itself and for the joint linearization of\ua0the cascaded RF-DAC-based modulator and power amplifier (PA) combination.\ua0To adapt the linearization, observation receivers are needed.\ua0In these, high-speed analog-to-digital converters (ADCs) have a central\ua0role. A high-speed ADC has been designed and evaluated to understand\ua0how concepts used to increase the sample rate affect the dynamic performance
- …