51,384 research outputs found

    Dynamic Power Management for Neuromorphic Many-Core Systems

    Full text link
    This work presents a dynamic power management architecture for neuromorphic many core systems such as SpiNNaker. A fast dynamic voltage and frequency scaling (DVFS) technique is presented which allows the processing elements (PE) to change their supply voltage and clock frequency individually and autonomously within less than 100 ns. This is employed by the neuromorphic simulation software flow, which defines the performance level (PL) of the PE based on the actual workload within each simulation cycle. A test chip in 28 nm SLP CMOS technology has been implemented. It includes 4 PEs which can be scaled from 0.7 V to 1.0 V with frequencies from 125 MHz to 500 MHz at three distinct PLs. By measurement of three neuromorphic benchmarks it is shown that the total PE power consumption can be reduced by 75%, with 80% baseline power reduction and a 50% reduction of energy per neuron and synapse computation, all while maintaining temporary peak system performance to achieve biological real-time operation of the system. A numerical model of this power management model is derived which allows DVFS architecture exploration for neuromorphics. The proposed technique is to be used for the second generation SpiNNaker neuromorphic many core system

    Communication channel analysis and real time compressed sensing for high density neural recording devices

    Get PDF
    Next generation neural recording and Brain- Machine Interface (BMI) devices call for high density or distributed systems with more than 1000 recording sites. As the recording site density grows, the device generates data on the scale of several hundred megabits per second (Mbps). Transmitting such large amounts of data induces significant power consumption and heat dissipation for the implanted electronics. Facing these constraints, efficient on-chip compression techniques become essential to the reduction of implanted systems power consumption. This paper analyzes the communication channel constraints for high density neural recording devices. This paper then quantifies the improvement on communication channel using efficient on-chip compression methods. Finally, This paper describes a Compressed Sensing (CS) based system that can reduce the data rate by > 10x times while using power on the order of a few hundred nW per recording channel

    Polynomial Threshold Functions, AC^0 Functions and Spectral Norms

    Get PDF
    The class of polynomial-threshold functions is studied using harmonic analysis, and the results are used to derive lower bounds related to AC^0 functions. A Boolean function is polynomial threshold if it can be represented as a sign function of a sparse polynomial (one that consists of a polynomial number of terms). The main result is that polynomial-threshold functions can be characterized by means of their spectral representation. In particular, it is proved that a Boolean function whose L_1 spectral norm is bounded by a polynomial in n is a polynomial-threshold function, and that a Boolean function whose L_∞^(-1) spectral norm is not bounded by a polynomial in n is not a polynomial-threshold function. Some results for AC^0 functions are derived

    Optimized Surface Code Communication in Superconducting Quantum Computers

    Full text link
    Quantum computing (QC) is at the cusp of a revolution. Machines with 100 quantum bits (qubits) are anticipated to be operational by 2020 [googlemachine,gambetta2015building], and several-hundred-qubit machines are around the corner. Machines of this scale have the capacity to demonstrate quantum supremacy, the tipping point where QC is faster than the fastest classical alternative for a particular problem. Because error correction techniques will be central to QC and will be the most expensive component of quantum computation, choosing the lowest-overhead error correction scheme is critical to overall QC success. This paper evaluates two established quantum error correction codes---planar and double-defect surface codes---using a set of compilation, scheduling and network simulation tools. In considering scalable methods for optimizing both codes, we do so in the context of a full microarchitectural and compiler analysis. Contrary to previous predictions, we find that the simpler planar codes are sometimes more favorable for implementation on superconducting quantum computers, especially under conditions of high communication congestion.Comment: 14 pages, 9 figures, The 50th Annual IEEE/ACM International Symposium on Microarchitectur

    On Timing Model Extraction and Hierarchical Statistical Timing Analysis

    Full text link
    In this paper, we investigate the challenges to apply Statistical Static Timing Analysis (SSTA) in hierarchical design flow, where modules supplied by IP vendors are used to hide design details for IP protection and to reduce the complexity of design and verification. For the three basic circuit types, combinational, flip-flop-based and latch-controlled, we propose methods to extract timing models which contain interfacing as well as compressed internal constraints. Using these compact timing models the runtime of full-chip timing analysis can be reduced, while circuit details from IP vendors are not exposed. We also propose a method to reconstruct the correlation between modules during full-chip timing analysis. This correlation can not be incorporated into timing models because it depends on the layout of the corresponding modules in the chip. In addition, we investigate how to apply the extracted timing models with the reconstructed correlation to evaluate the performance of the complete design. Experiments demonstrate that using the extracted timing models and reconstructed correlation full-chip timing analysis can be several times faster than applying the flattened circuit directly, while the accuracy of statistical timing analysis is still well maintained
    • …
    corecore