5,347 research outputs found
Compressive Imaging Using RIP-Compliant CMOS Imager Architecture and Landweber Reconstruction
In this paper, we present a new image sensor architecture for fast and accurate compressive sensing (CS) of natural images. Measurement matrices usually employed in CS CMOS image sensors are recursive pseudo-random binary matrices. We have proved that the restricted isometry property of these matrices is limited by a low sparsity constant. The quality of these matrices is also affected by the non-idealities of pseudo-random number generators (PRNG). To overcome these limitations, we propose a hardware-friendly pseudo-random ternary measurement matrix generated on-chip by means of class III elementary cellular automata (ECA). These ECA present a chaotic behavior that emulates random CS measurement matrices better than other PRNG. We have combined this new architecture with a block-based CS smoothed-projected Landweber reconstruction algorithm. By means of single value decomposition, we have adapted this algorithm to perform fast and precise reconstruction while operating with binary and ternary matrices. Simulations are provided to qualify the approach.Ministerio de Economía y Competitividad TEC2015-66878-C3-1-RJunta de Andalucía TIC 2338-2013Office of Naval Research (USA) N000141410355European Union H2020 76586
Algorithmic Aspects of Cyclic Combinational Circuit Synthesis
Digital circuits are called combinational if they are memoryless: if they have outputs that depend only on the current values of the inputs. Combinational circuits are generally thought of as acyclic (i.e., feed-forward) structures. And yet, cyclic circuits can be combinational. Cycles sometimes occur in designs synthesized from high-level descriptions, as well as in bus-based designs [16]. Feedback in such cases is carefully contrived, typically occurring when functional units are connected in a cyclic topology. Although the premise of cycles in combinational circuits has been accepted, and analysis techniques have been proposed [7], no one has attempted the synthesis of circuits with feedback at the logic level.
We have argued the case for a paradigm shift in combinational circuit design [10]. We should no longer think of combinational logic as acyclic in theory or in practice, since most combinational circuits are best designed with cycles. We have proposed a general methodology for the synthesis of multilevel networks with cyclic topologies and incorporated it in a general logic synthesis environment. In trials, benchmark circuits were optimized significantly, with improvements of up to 30%I n the area. In this paper, we discuss algorithmic aspects of cyclic circuit design. We formulate a symbolic framework for analysis based on a divide-and-conquer strategy. Unlike previous approaches, our method does not require ternary-valued simulation. Our analysis for combinationality is tightly coupled with the synthesis phase, in which we assemble a combinational network from smaller combinational components. We discuss the underpinnings of the heuristic search methods and present examples as well as synthesis results for benchmark circuits.
In this paper, we discuss algorithmic aspects of cyclic circuit design. We formulate a symbolic framework for analysis based on a divide-and-conquer strategy. Unlike previous approaches, our method does not require ternary-valued simulation. Our analysis for combinationality is tightly coupled with the synthesis phase, in which we assemble a combinational network from smaller combinational components. We discuss the underpinnings of the heuristic search methods and present examples as well as synthesis results for benchmark circuits
Novel Ternary Logic Gates Design in Nanoelectronics
In this paper, standard ternary logic gates are initially designed to considerably reduce static power consumption. This study proposes novel ternary gates based on two supply voltages in which the direct current is eliminated and the leakage current is reduced considerably. In addition, ST-OR and ST-AND are generated directly instead of ST-NAND and ST-NOR. The proposed gates have a high noise margin near V_(DD)/4. The simulation results indicated that the power consumption and PDP underwent a~sharp decrease and noise margin showed a considerable increase in comparison to both one supply and two supply based designs in previous works. PDP is improved in the proposed OR, as compared to one supply and two supply based previous works about 83% and 63%, respectively. Also, a memory cell is designed using the proposed STI logic gate, which has a considerably lower static power to store logic ‘1’ and the static noise margin, as compared to other designs
Hardware-efficient on-line learning through pipelined truncated-error backpropagation in binary-state networks
Artificial neural networks (ANNs) trained using backpropagation are powerful
learning architectures that have achieved state-of-the-art performance in
various benchmarks. Significant effort has been devoted to developing custom
silicon devices to accelerate inference in ANNs. Accelerating the training
phase, however, has attracted relatively little attention. In this paper, we
describe a hardware-efficient on-line learning technique for feedforward
multi-layer ANNs that is based on pipelined backpropagation. Learning is
performed in parallel with inference in the forward pass, removing the need for
an explicit backward pass and requiring no extra weight lookup. By using binary
state variables in the feedforward network and ternary errors in
truncated-error backpropagation, the need for any multiplications in the
forward and backward passes is removed, and memory requirements for the
pipelining are drastically reduced. Further reduction in addition operations
owing to the sparsity in the forward neural and backpropagating error signal
paths contributes to highly efficient hardware implementation. For
proof-of-concept validation, we demonstrate on-line learning of MNIST
handwritten digit classification on a Spartan 6 FPGA interfacing with an
external 1Gb DDR2 DRAM, that shows small degradation in test error performance
compared to an equivalently sized binary ANN trained off-line using standard
back-propagation and exact errors. Our results highlight an attractive synergy
between pipelined backpropagation and binary-state networks in substantially
reducing computation and memory requirements, making pipelined on-line learning
practical in deep networks.Comment: Now also consider 0/1 binary activations. Memory access statistics
reporte
CUTIE: Beyond PetaOp/s/W Ternary DNN Inference Acceleration with Better-than-Binary Energy Efficiency
We present a 3.1 POp/s/W fully digital hardware accelerator for ternary
neural networks. CUTIE, the Completely Unrolled Ternary Inference Engine,
focuses on minimizing non-computational energy and switching activity so that
dynamic power spent on storing (locally or globally) intermediate results is
minimized. This is achieved by 1) a data path architecture completely unrolled
in the feature map and filter dimensions to reduce switching activity by
favoring silencing over iterative computation and maximizing data re-use, 2)
targeting ternary neural networks which, in contrast to binary NNs, allow for
sparse weights which reduce switching activity, and 3) introducing an optimized
training method for higher sparsity of the filter weights, resulting in a
further reduction of the switching activity. Compared with state-of-the-art
accelerators, CUTIE achieves greater or equal accuracy while decreasing the
overall core inference energy cost by a factor of 4.8x-21x
Capacity, Fidelity, and Noise Tolerance of Associative Spatial-Temporal Memories Based on Memristive Neuromorphic Network
We have calculated the key characteristics of associative
(content-addressable) spatial-temporal memories based on neuromorphic networks
with restricted connectivity - "CrossNets". Such networks may be naturally
implemented in nanoelectronic hardware using hybrid CMOS/memristor circuits,
which may feature extremely high energy efficiency, approaching that of
biological cortical circuits, at much higher operation speed. Our numerical
simulations, in some cases confirmed by analytical calculations, have shown
that the characteristics depend substantially on the method of information
recording into the memory. Of the four methods we have explored, two look
especially promising - one based on the quadratic programming, and the other
one being a specific discrete version of the gradient descent. The latter
method provides a slightly lower memory capacity (at the same fidelity) then
the former one, but it allows local recording, which may be more readily
implemented in nanoelectronic hardware. Most importantly, at the synchronous
retrieval, both methods provide a capacity higher than that of the well-known
Ternary Content-Addressable Memories with the same number of nonvolatile memory
cells (e.g., memristors), though the input noise immunity of the CrossNet
memories is somewhat lower
- …