4,258 research outputs found

    VLSI Implementation of Deep Neural Network Using Integral Stochastic Computing

    Full text link
    The hardware implementation of deep neural networks (DNNs) has recently received tremendous attention: many applications in fact require high-speed operations that suit a hardware implementation. However, numerous elements and complex interconnections are usually required, leading to a large area occupation and copious power consumption. Stochastic computing has shown promising results for low-power area-efficient hardware implementations, even though existing stochastic algorithms require long streams that cause long latencies. In this paper, we propose an integer form of stochastic computation and introduce some elementary circuits. We then propose an efficient implementation of a DNN based on integral stochastic computing. The proposed architecture has been implemented on a Virtex7 FPGA, resulting in 45% and 62% average reductions in area and latency compared to the best reported architecture in literature. We also synthesize the circuits in a 65 nm CMOS technology and we show that the proposed integral stochastic architecture results in up to 21% reduction in energy consumption compared to the binary radix implementation at the same misclassification rate. Due to fault-tolerant nature of stochastic architectures, we also consider a quasi-synchronous implementation which yields 33% reduction in energy consumption w.r.t. the binary radix implementation without any compromise on performance.Comment: 11 pages, 12 figure

    A Study of Deep Learning Robustness Against Computation Failures

    Full text link
    For many types of integrated circuits, accepting larger failure rates in computations can be used to improve energy efficiency. We study the performance of faulty implementations of certain deep neural networks based on pessimistic and optimistic models of the effect of hardware faults. After identifying the impact of hyperparameters such as the number of layers on robustness, we study the ability of the network to compensate for computational failures through an increase of the network size. We show that some networks can achieve equivalent performance under faulty implementations, and quantify the required increase in computational complexity

    Insights into tunnel FET-based charge pumps and rectifiers for energy harvesting applications

    Get PDF
    In this paper, the electrical characteristics of tunnel field-effect transistor (TFET) devices are explored for energy harvesting front-end circuits with ultralow power consumption. Compared with conventional thermionic technologies, the improved electrical characteristics of TFET devices are expected to increase the power conversion efficiency of front-end charge pumps and rectifiers powered at sub-µW power levels. However, under reverse bias conditions the TFET device presents particular electrical characteristics due to its different carrier injection mechanism. In this paper, it is shown that reverse losses in TFET-based circuits can be attenuated by changing the gate-to-source voltage of reverse-biased TFETs. Therefore, in order to take full advantage of the TFETs in front-end energy harvesting circuits, different circuit approaches are required. In this paper, we propose and discuss different topologies for TFET-based charge pumps and rectifiers for energy harvesting applications.Peer ReviewedPostprint (author's final draft

    Communication channel analysis and real time compressed sensing for high density neural recording devices

    Get PDF
    Next generation neural recording and Brain- Machine Interface (BMI) devices call for high density or distributed systems with more than 1000 recording sites. As the recording site density grows, the device generates data on the scale of several hundred megabits per second (Mbps). Transmitting such large amounts of data induces significant power consumption and heat dissipation for the implanted electronics. Facing these constraints, efficient on-chip compression techniques become essential to the reduction of implanted systems power consumption. This paper analyzes the communication channel constraints for high density neural recording devices. This paper then quantifies the improvement on communication channel using efficient on-chip compression methods. Finally, This paper describes a Compressed Sensing (CS) based system that can reduce the data rate by > 10x times while using power on the order of a few hundred nW per recording channel

    Memristor MOS Content Addressable Memory (MCAM): Hybrid Architecture for Future High Performance Search Engines

    Full text link
    Large-capacity Content Addressable Memory (CAM) is a key element in a wide variety of applications. The inevitable complexities of scaling MOS transistors introduce a major challenge in the realization of such systems. Convergence of disparate technologies, which are compatible with CMOS processing, may allow extension of Moore's Law for a few more years. This paper provides a new approach towards the design and modeling of Memristor (Memory resistor) based Content Addressable Memory (MCAM) using a combination of memristor MOS devices to form the core of a memory/compare logic cell that forms the building block of the CAM architecture. The non-volatile characteristic and the nanoscale geometry together with compatibility of the memristor with CMOS processing technology increases the packing density, provides for new approaches towards power management through disabling CAM blocks without loss of stored data, reduces power dissipation, and has scope for speed improvement as the technology matures.Comment: 10 pages, 11 figure

    Area/latency optimized early output asynchronous full adders and relative-timed ripple carry adders

    Get PDF
    This article presents two area/latency optimized gate level asynchronous full adder designs which correspond to early output logic. The proposed full adders are constructed using the delay-insensitive dual-rail code and adhere to the four-phase return-to-zero handshaking. For an asynchronous ripple carry adder (RCA) constructed using the proposed early output full adders, the relative-timing assumption becomes necessary and the inherent advantages of the relative-timed RCA are: (1) computation with valid inputs, i.e., forward latency is data-dependent, and (2) computation with spacer inputs involves a bare minimum constant reverse latency of just one full adder delay, thus resulting in the optimal cycle time. With respect to different 32-bit RCA implementations, and in comparison with the optimized strong-indication, weak-indication, and early output full adder designs, one of the proposed early output full adders achieves respective reductions in latency by 67.8, 12.3 and 6.1 %, while the other proposed early output full adder achieves corresponding reductions in area by 32.6, 24.6 and 6.9 %, with practically no power penalty. Further, the proposed early output full adders based asynchronous RCAs enable minimum reductions in cycle time by 83.4, 15, and 8.8 % when considering carry-propagation over the entire RCA width of 32-bits, and maximum reductions in cycle time by 97.5, 27.4, and 22.4 % for the consideration of a typical carry chain length of 4 full adder stages, when compared to the least of the cycle time estimates of various strong-indication, weak-indication, and early output asynchronous RCAs of similar size. All the asynchronous full adders and RCAs were realized using standard cells in a semi-custom design fashion based on a 32/28 nm CMOS process technology
    • …
    corecore