5,347 research outputs found

    Compressive Imaging Using RIP-Compliant CMOS Imager Architecture and Landweber Reconstruction

    Get PDF
    In this paper, we present a new image sensor architecture for fast and accurate compressive sensing (CS) of natural images. Measurement matrices usually employed in CS CMOS image sensors are recursive pseudo-random binary matrices. We have proved that the restricted isometry property of these matrices is limited by a low sparsity constant. The quality of these matrices is also affected by the non-idealities of pseudo-random number generators (PRNG). To overcome these limitations, we propose a hardware-friendly pseudo-random ternary measurement matrix generated on-chip by means of class III elementary cellular automata (ECA). These ECA present a chaotic behavior that emulates random CS measurement matrices better than other PRNG. We have combined this new architecture with a block-based CS smoothed-projected Landweber reconstruction algorithm. By means of single value decomposition, we have adapted this algorithm to perform fast and precise reconstruction while operating with binary and ternary matrices. Simulations are provided to qualify the approach.Ministerio de Economía y Competitividad TEC2015-66878-C3-1-RJunta de Andalucía TIC 2338-2013Office of Naval Research (USA) N000141410355European Union H2020 76586

    Algorithmic Aspects of Cyclic Combinational Circuit Synthesis

    Get PDF
    Digital circuits are called combinational if they are memoryless: if they have outputs that depend only on the current values of the inputs. Combinational circuits are generally thought of as acyclic (i.e., feed-forward) structures. And yet, cyclic circuits can be combinational. Cycles sometimes occur in designs synthesized from high-level descriptions, as well as in bus-based designs [16]. Feedback in such cases is carefully contrived, typically occurring when functional units are connected in a cyclic topology. Although the premise of cycles in combinational circuits has been accepted, and analysis techniques have been proposed [7], no one has attempted the synthesis of circuits with feedback at the logic level. We have argued the case for a paradigm shift in combinational circuit design [10]. We should no longer think of combinational logic as acyclic in theory or in practice, since most combinational circuits are best designed with cycles. We have proposed a general methodology for the synthesis of multilevel networks with cyclic topologies and incorporated it in a general logic synthesis environment. In trials, benchmark circuits were optimized significantly, with improvements of up to 30%I n the area. In this paper, we discuss algorithmic aspects of cyclic circuit design. We formulate a symbolic framework for analysis based on a divide-and-conquer strategy. Unlike previous approaches, our method does not require ternary-valued simulation. Our analysis for combinationality is tightly coupled with the synthesis phase, in which we assemble a combinational network from smaller combinational components. We discuss the underpinnings of the heuristic search methods and present examples as well as synthesis results for benchmark circuits. In this paper, we discuss algorithmic aspects of cyclic circuit design. We formulate a symbolic framework for analysis based on a divide-and-conquer strategy. Unlike previous approaches, our method does not require ternary-valued simulation. Our analysis for combinationality is tightly coupled with the synthesis phase, in which we assemble a combinational network from smaller combinational components. We discuss the underpinnings of the heuristic search methods and present examples as well as synthesis results for benchmark circuits

    Novel Ternary Logic Gates Design in Nanoelectronics

    Get PDF
    In this paper, standard ternary logic gates are initially designed to considerably reduce static power consumption. This study proposes novel ternary gates based on two supply voltages in which the direct current is eliminated and the leakage current is reduced considerably. In addition, ST-OR and ST-AND are generated directly instead of ST-NAND and ST-NOR. The proposed gates have a high noise margin near V_(DD)/4. The simulation results indicated that the power consumption and PDP underwent a~sharp decrease and noise margin showed a considerable increase in comparison to both one supply and two supply based designs in previous works. PDP is improved in the proposed OR, as compared to one supply and two supply based previous works about 83% and 63%, respectively. Also, a memory cell is designed using the proposed STI logic gate, which has a considerably lower static power to store logic ‘1’ and the static noise margin, as compared to other designs

    Hardware-efficient on-line learning through pipelined truncated-error backpropagation in binary-state networks

    Get PDF
    Artificial neural networks (ANNs) trained using backpropagation are powerful learning architectures that have achieved state-of-the-art performance in various benchmarks. Significant effort has been devoted to developing custom silicon devices to accelerate inference in ANNs. Accelerating the training phase, however, has attracted relatively little attention. In this paper, we describe a hardware-efficient on-line learning technique for feedforward multi-layer ANNs that is based on pipelined backpropagation. Learning is performed in parallel with inference in the forward pass, removing the need for an explicit backward pass and requiring no extra weight lookup. By using binary state variables in the feedforward network and ternary errors in truncated-error backpropagation, the need for any multiplications in the forward and backward passes is removed, and memory requirements for the pipelining are drastically reduced. Further reduction in addition operations owing to the sparsity in the forward neural and backpropagating error signal paths contributes to highly efficient hardware implementation. For proof-of-concept validation, we demonstrate on-line learning of MNIST handwritten digit classification on a Spartan 6 FPGA interfacing with an external 1Gb DDR2 DRAM, that shows small degradation in test error performance compared to an equivalently sized binary ANN trained off-line using standard back-propagation and exact errors. Our results highlight an attractive synergy between pipelined backpropagation and binary-state networks in substantially reducing computation and memory requirements, making pipelined on-line learning practical in deep networks.Comment: Now also consider 0/1 binary activations. Memory access statistics reporte

    CUTIE: Beyond PetaOp/s/W Ternary DNN Inference Acceleration with Better-than-Binary Energy Efficiency

    Get PDF
    We present a 3.1 POp/s/W fully digital hardware accelerator for ternary neural networks. CUTIE, the Completely Unrolled Ternary Inference Engine, focuses on minimizing non-computational energy and switching activity so that dynamic power spent on storing (locally or globally) intermediate results is minimized. This is achieved by 1) a data path architecture completely unrolled in the feature map and filter dimensions to reduce switching activity by favoring silencing over iterative computation and maximizing data re-use, 2) targeting ternary neural networks which, in contrast to binary NNs, allow for sparse weights which reduce switching activity, and 3) introducing an optimized training method for higher sparsity of the filter weights, resulting in a further reduction of the switching activity. Compared with state-of-the-art accelerators, CUTIE achieves greater or equal accuracy while decreasing the overall core inference energy cost by a factor of 4.8x-21x

    Capacity, Fidelity, and Noise Tolerance of Associative Spatial-Temporal Memories Based on Memristive Neuromorphic Network

    Full text link
    We have calculated the key characteristics of associative (content-addressable) spatial-temporal memories based on neuromorphic networks with restricted connectivity - "CrossNets". Such networks may be naturally implemented in nanoelectronic hardware using hybrid CMOS/memristor circuits, which may feature extremely high energy efficiency, approaching that of biological cortical circuits, at much higher operation speed. Our numerical simulations, in some cases confirmed by analytical calculations, have shown that the characteristics depend substantially on the method of information recording into the memory. Of the four methods we have explored, two look especially promising - one based on the quadratic programming, and the other one being a specific discrete version of the gradient descent. The latter method provides a slightly lower memory capacity (at the same fidelity) then the former one, but it allows local recording, which may be more readily implemented in nanoelectronic hardware. Most importantly, at the synchronous retrieval, both methods provide a capacity higher than that of the well-known Ternary Content-Addressable Memories with the same number of nonvolatile memory cells (e.g., memristors), though the input noise immunity of the CrossNet memories is somewhat lower
    corecore