20 research outputs found

    The Efficient Design of Time-to-Digital Converters

    Get PDF

    Energy Efficient and Error Resilient Neuromorphic Computing in VLSI

    Get PDF
    Realization of the conventional Von Neumann architecture faces increasing challenges due to growing process variations, device reliability and power consumption. As an appealing architectural solution, brain-inspired neuromorphic computing has drawn a great deal of research interest due to its potential improved scalability and power efficiency, and better suitability in processing complex tasks. Moreover, inherit error resilience in neuromorphic computing allows remarkable power and energy savings by exploiting approximate computing. This dissertation focuses on a scalable and energy efficient neurocomputing architecture which leverages emerging memristor nanodevices and a novel approximate arithmetic for cognitive computing. First, brain-inspired digital neuromorphic processor (DNP) architecture with memristive synaptic crossbar is presented for large scale spiking neural networks. We leverage memristor nanodevices to build an N ×N crossbar array to store not only multibit synaptic weight values but also the network configuration data with significantly reduced area cost. Additionally, the crossbar array is accessible both column- and row-wise to significantly expedite the synaptic weight update process for on-chip learning. The proposed digital pulse width modulator (PWM) readily creates a binary pulse with various durations to read and write the multilevel memristors with low cost. Our design integrates N digital leaky integrate-and-fire (LIF) silicon neurons to mimic their biological counterparts and the respective on-chip learning circuits for implementing spike timing dependent plasticity (STDP) learning rules. The proposed column based analog-to-digital conversion (ADC) scheme accumulates the pre-synaptic weights of a neuron efficiently and reduces silicon area by using only one shared arithmetic unit for processing LIF operations of all N neurons. With 256 silicon neurons, the learning circuits and 64K synapses, the power dissipation and area of our design are evaluated as 6.45 mW and 1.86 mm2, respectively, in a 90 nm CMOS technology. Furthermore, arithmetic computations contribute significantly to the overall processing time and power of the proposed architecture. In particular, addition and comparison operations represent 88.5% and 42.9% of processing time and power for digital LIF computation, respectively. Hence, by exploiting the built-in resilience of the presented neuromorphic architecture, we propose novel approximate adder and comparator designs to significantly reduce energy consumption with a very low er- ror rate. The significantly improved error rate and critical path delay stem from a novel carry prediction technique that leverages the information from less significant input bits in a parallel manner. An error magnitude reduction scheme is proposed to further reduce amount of error once detected with low cost in the proposed adder design. Implemented in a commercial 90 nm CMOS process, it is shown that the proposed adder is up to 2.4× faster and 43% more energy efficient over traditional adders while having an error rate of only 0.18%. Additionally, the proposed com- parator achieves an error rate of less than 0.1% and an energy reduction of up to 4.9× compared to the conventional ones. The proposed arithmetic has been adopted in a VLSI-based neuromorphic character recognition chip using unsupervised learning. The approximation errors of the proposed arithmetic units have been shown to have negligible impacts on the training process. Moreover, the energy saving of up to 66.5% over traditional arithmetic units is achieved for the neuromorphic chip with scaled supply levels

    The 1991 3rd NASA Symposium on VLSI Design

    Get PDF
    Papers from the symposium are presented from the following sessions: (1) featured presentations 1; (2) very large scale integration (VLSI) circuit design; (3) VLSI architecture 1; (4) featured presentations 2; (5) neural networks; (6) VLSI architectures 2; (7) featured presentations 3; (8) verification 1; (9) analog design; (10) verification 2; (11) design innovations 1; (12) asynchronous design; and (13) design innovations 2

    Energy-Efficient Neural Network Architectures

    Full text link
    Emerging systems for artificial intelligence (AI) are expected to rely on deep neural networks (DNNs) to achieve high accuracy for a broad variety of applications, including computer vision, robotics, and speech recognition. Due to the rapid growth of network size and depth, however, DNNs typically result in high computational costs and introduce considerable power and performance overheads. Dedicated chip architectures that implement DNNs with high energy efficiency are essential for adding intelligence to interactive edge devices, enabling them to complete increasingly sophisticated tasks by extending battery lie. They are also vital for improving performance in cloud servers that support demanding AI computations. This dissertation focuses on architectures and circuit technologies for designing energy-efficient neural network accelerators. First, a deep-learning processor is presented for achieving ultra-low power operation. Using a heterogeneous architecture that includes a low-power always-on front-end and a selectively-enabled high-performance back-end, the processor dynamically adjusts computational resources at runtime to support conditional execution in neural networks and meet performance targets with increased energy efficiency. Featuring a reconfigurable datapath and a memory architecture optimized for energy efficiency, the processor supports multilevel dynamic activation of neural network segments, performing object detection tasks with 5.3x lower energy consumption in comparison with a static execution baseline. Fabricated in 40nm CMOS, the processor test-chip dissipates 0.23mW at 5.3 fps. It demonstrates energy scalability up to 28.6 TOPS/W and can be configured to run a variety of workloads, including severely power-constrained ones such as always-on monitoring in mobile applications. To further improve the energy efficiency of the proposed heterogeneous architecture, a new charge-recovery logic family, called zero-short-circuit current (ZSCC) logic, is proposed to decrease the power consumption of the always-on front-end. By relying on dedicated circuit topologies and a four-phase clocking scheme, ZSCC operates with significantly reduced short-circuit currents, realizing order-of-magnitude power savings at relatively low clock frequencies (in the order of a few MHz). The efficiency and applicability of ZSCC is demonstrated through an ANSI S1.11 1/3 octave filter bank chip for binaural hearing aids with two microphones per ear. Fabricated in a 65nm CMOS process, this charge-recovery chip consumes 13.8µW with a 1.75MHz clock frequency, achieving 9.7x power reduction per input in comparison with a 40nm monophonic single-input chip that represents the published state of the art. The ability of ZSCC to further increase the energy efficiency of the heterogeneous neural network architecture is demonstrated through the design and evaluation of a ZSCC-based front-end. Simulation results show 17x power reduction compared with a conventional static CMOS implementation of the same architecture.PHDElectrical and Computer EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/147614/1/hsiwu_1.pd

    The Fifth NASA Symposium on VLSI Design

    Get PDF
    The fifth annual NASA Symposium on VLSI Design had 13 sessions including Radiation Effects, Architectures, Mixed Signal, Design Techniques, Fault Testing, Synthesis, Signal Processing, and other Featured Presentations. The symposium provides insights into developments in VLSI and digital systems which can be used to increase data systems performance. The presentations share insights into next generation advances that will serve as a basis for future VLSI design

    NASA Space Engineering Research Center Symposium on VLSI Design

    Get PDF
    The NASA Space Engineering Research Center (SERC) is proud to offer, at its second symposium on VLSI design, presentations by an outstanding set of individuals from national laboratories and the electronics industry. These featured speakers share insights into next generation advances that will serve as a basis for future VLSI design. Questions of reliability in the space environment along with new directions in CAD and design are addressed by the featured speakers

    Cross-Layer Automated Hardware Design for Accuracy-Configurable Approximate Computing

    Get PDF
    Approximate Computing trades off computation accuracy against performance or energy efficiency. It is a design paradigm that arose in the last decade as an answer to diminishing returns from Dennard\u27s scaling and a shift in the prominent workloads. A range of modern workloads, categorized mainly as recognition, mining, and synthesis, features an inherent tolerance to approximations. Their characteristics, such as redundancies in their input data and robust-to-noise algorithms, allow them to produce outputs of acceptable quality, despite an approximation in some of their computations. Approximate Computing leverages the application tolerance by relaxing the exactness in computation towards primary design goals of increasing performance or improving energy efficiency. Existing techniques span across the abstraction layers of computer systems where cross-layer techniques are shown to offer a larger design space and yield higher savings. Currently, the majority of the existing work aims at meeting a single accuracy. The extent of approximation tolerance, however, significantly varies with a change in input characteristics and applications. In this dissertation, methods and implementations are presented for cross-layer and automated design of accuracy-configurable Approximate Computing to maximally exploit the performance and energy benefits. In particular, this dissertation addresses the following challenges and introduces novel contributions: A main Approximate Computing category in hardware is to scale either voltage or frequency beyond the safe limits for power or performance benefits, respectively. The rationale is that timing errors would be gradual and for an initial range tolerable. This scaling enables a fine-grain accuracy-configurability by varying the timing error occurrence. However, conventional synthesis tools aim at meeting a single delay for all paths within the circuit. Subsequently, with voltage or frequency scaling, either all paths succeed, or a large number of paths fail simultaneously, with a steep increase in error rate and magnitude. This dissertation presents an automated method for minimizing path delays by individually constraining the primary outputs of combinational circuits. As a result, it reduces the number of failing paths and makes the timing errors significantly more gradual, and also rarer and smaller on average. Additionally, it reveals that delays can be significantly reduced towards the least significant bit (LSB) and allows operating at a higher frequency when small operands are computed. Precision scaling, i.e., reducing the representation of data and its accuracy is widely used in multiple abstraction layers in Approximate Computing. Reducing data precision also reduces the transistor toggles, and therefore the dynamic power consumption. Application and architecture level precision scaling results in using only LSBs of the circuit. Arithmetic circuits often have less complexity and logic depth in LSBs compared to most significant bits (MSB). To take advantage of this circuit property, a delay-altering synthesis methodology is proposed. The method finds energy-optimal delay values under configurable precision usage and assigns them to primary outputs used for different precisions. Thereby, it enables dynamic frequency-precision scalable circuits for energy efficiency. Within the hardware architecture, it is possible to instantiate multiple units with the same functionality with different fixed approximation levels, where each block benefits from having fewer transistors and also synthesis relaxations. These blocks can be selected dynamically and thus allow to configure the accuracy during runtime. Instantiating such approximate blocks can be a lower dynamic power but higher area and leakage cost alternative to the current state-of-the-art gating mechanisms which switch off a group of paths in the circuit to reduce the toggling activity. Jointly, instantiating multiple blocks and gating mechanisms produce a large design space of accuracy-configurable hardware, where energy-optimal solutions require a cross-layer search in architecture and circuit levels. To that end, an approximate hardware synthesis methodology is proposed with joint optimizations in architecture and circuit for dynamic accuracy scaling, and thereby it enables energy vs. area trade-offs

    Parallel reconfigurable single photon avalanche diode array for optical communications

    Get PDF
    There is a pressing need to develop alternative communications links due to a number of physical phenomena, limiting the bandwidth and energy efficiency of wire-based systems or economic factors such as cost, material-supply reliability and environmental costs. Networks have moved to optical connections to reduce costs, energy use and to supply high data rates. A primary concern is that current optical-detection devices require high optical power to achieve fast data rates with high signal quality. The energy required therefore, quickly becomes a problem. In this thesis, advances in single-photon avalanche diodes (SPADs) are utilised to reduce the amount of light needed and to reduce the overall energy budget. Current high performance receivers often use exotic materials, many of which have severe environmental impact and have cost, supply and political restrictions. These present a problem when it comes to integration; hence silicon technology is used, allowing small, mass-producible, low power receivers. A reconfigurable SPAD-based integrating receiver in standard 130nm imaging CMOS is presented for links with a readout bandwidth of 100MHz. A maximum count rate of 58G photon/s is observed, with a dynamic range of ≈ 79dB, a sensitivity of ≈ −31.7dBm at 100MHz and a BER of ≈ 1x10−9. We investigate the properties of the receiver for optical communications in the visible spectrum, using its added functionality and reconfigurability to experimentally explore non-ideal influences. The all-digital 32x32 SPAD array, achieves a minimum dead time of 5.9ns, and a median dark count rate (DCR) of 2.5kHz per SPAD. High noise devices can be weighted or removed to optimise the SNR. The power requirements, transient response and received data are explored and limiting factors similar to those of photodiode receivers are observed. The thesis concludes that data can be captured well with such a device but more electrical energy is needed at the receiver due to its fundamental operation. Overall, optical power can be reduced, allowing significant savings in either transmitter power or the transmission length, along with the advantages of an integrated digital chip
    corecore