52,325 research outputs found
Dynamic Power Management for Neuromorphic Many-Core Systems
This work presents a dynamic power management architecture for neuromorphic
many core systems such as SpiNNaker. A fast dynamic voltage and frequency
scaling (DVFS) technique is presented which allows the processing elements (PE)
to change their supply voltage and clock frequency individually and
autonomously within less than 100 ns. This is employed by the neuromorphic
simulation software flow, which defines the performance level (PL) of the PE
based on the actual workload within each simulation cycle. A test chip in 28 nm
SLP CMOS technology has been implemented. It includes 4 PEs which can be scaled
from 0.7 V to 1.0 V with frequencies from 125 MHz to 500 MHz at three distinct
PLs. By measurement of three neuromorphic benchmarks it is shown that the total
PE power consumption can be reduced by 75%, with 80% baseline power reduction
and a 50% reduction of energy per neuron and synapse computation, all while
maintaining temporary peak system performance to achieve biological real-time
operation of the system. A numerical model of this power management model is
derived which allows DVFS architecture exploration for neuromorphics. The
proposed technique is to be used for the second generation SpiNNaker
neuromorphic many core system
Communication channel analysis and real time compressed sensing for high density neural recording devices
Next generation neural recording and Brain-
Machine Interface (BMI) devices call for high density or distributed
systems with more than 1000 recording sites. As the
recording site density grows, the device generates data on the
scale of several hundred megabits per second (Mbps). Transmitting
such large amounts of data induces significant power
consumption and heat dissipation for the implanted electronics.
Facing these constraints, efficient on-chip compression techniques
become essential to the reduction of implanted systems power
consumption. This paper analyzes the communication channel
constraints for high density neural recording devices. This paper
then quantifies the improvement on communication channel
using efficient on-chip compression methods. Finally, This paper
describes a Compressed Sensing (CS) based system that can
reduce the data rate by > 10x times while using power on
the order of a few hundred nW per recording channel
Polynomial Threshold Functions, AC^0 Functions and Spectral Norms
The class of polynomial-threshold functions is studied using harmonic analysis, and the results are used to derive lower bounds related to AC^0 functions. A Boolean function is polynomial threshold if it can be represented as a sign function of a sparse polynomial (one that consists of a polynomial number of terms). The main result is that polynomial-threshold functions can be characterized by means of their spectral representation. In particular, it is proved that a Boolean function whose L_1 spectral norm is bounded by a polynomial in n is a polynomial-threshold function, and that a Boolean function whose L_∞^(-1) spectral norm is not bounded by a polynomial in n is not a polynomial-threshold function. Some results for AC^0 functions are derived
Optimized Surface Code Communication in Superconducting Quantum Computers
Quantum computing (QC) is at the cusp of a revolution. Machines with 100
quantum bits (qubits) are anticipated to be operational by 2020
[googlemachine,gambetta2015building], and several-hundred-qubit machines are
around the corner. Machines of this scale have the capacity to demonstrate
quantum supremacy, the tipping point where QC is faster than the fastest
classical alternative for a particular problem. Because error correction
techniques will be central to QC and will be the most expensive component of
quantum computation, choosing the lowest-overhead error correction scheme is
critical to overall QC success. This paper evaluates two established quantum
error correction codes---planar and double-defect surface codes---using a set
of compilation, scheduling and network simulation tools. In considering
scalable methods for optimizing both codes, we do so in the context of a full
microarchitectural and compiler analysis. Contrary to previous predictions, we
find that the simpler planar codes are sometimes more favorable for
implementation on superconducting quantum computers, especially under
conditions of high communication congestion.Comment: 14 pages, 9 figures, The 50th Annual IEEE/ACM International Symposium
on Microarchitectur
On Timing Model Extraction and Hierarchical Statistical Timing Analysis
In this paper, we investigate the challenges to apply Statistical Static
Timing Analysis (SSTA) in hierarchical design flow, where modules supplied by
IP vendors are used to hide design details for IP protection and to reduce the
complexity of design and verification. For the three basic circuit types,
combinational, flip-flop-based and latch-controlled, we propose methods to
extract timing models which contain interfacing as well as compressed internal
constraints. Using these compact timing models the runtime of full-chip timing
analysis can be reduced, while circuit details from IP vendors are not exposed.
We also propose a method to reconstruct the correlation between modules during
full-chip timing analysis. This correlation can not be incorporated into timing
models because it depends on the layout of the corresponding modules in the
chip. In addition, we investigate how to apply the extracted timing models with
the reconstructed correlation to evaluate the performance of the complete
design. Experiments demonstrate that using the extracted timing models and
reconstructed correlation full-chip timing analysis can be several times faster
than applying the flattened circuit directly, while the accuracy of statistical
timing analysis is still well maintained
- …