35 research outputs found

    Selective Decoding in Associative Memories Based on Sparse-Clustered Networks

    Full text link
    Associative memories are structures that can retrieve previously stored information given a partial input pattern instead of an explicit address as in indexed memories. A few hardware approaches have recently been introduced for a new family of associative memories based on Sparse-Clustered Networks (SCN) that show attractive features. These architectures are suitable for implementations with low retrieval latency, but are limited to small networks that store a few hundred data entries. In this paper, a new hardware architecture of SCNs is proposed that features a new data-storage technique as well as a method we refer to as Selective Decoding (SD-SCN). The SD-SCN has been implemented using a similar FPGA used in the previous efforts and achieves two orders of magnitude higher capacity, with no error-performance penalty but with the cost of few extra clock cycles per data access.Comment: 4 pages, Accepted in IEEE Global SIP 2013 conferenc

    VLSI Implementation of Deep Neural Network Using Integral Stochastic Computing

    Full text link
    The hardware implementation of deep neural networks (DNNs) has recently received tremendous attention: many applications in fact require high-speed operations that suit a hardware implementation. However, numerous elements and complex interconnections are usually required, leading to a large area occupation and copious power consumption. Stochastic computing has shown promising results for low-power area-efficient hardware implementations, even though existing stochastic algorithms require long streams that cause long latencies. In this paper, we propose an integer form of stochastic computation and introduce some elementary circuits. We then propose an efficient implementation of a DNN based on integral stochastic computing. The proposed architecture has been implemented on a Virtex7 FPGA, resulting in 45% and 62% average reductions in area and latency compared to the best reported architecture in literature. We also synthesize the circuits in a 65 nm CMOS technology and we show that the proposed integral stochastic architecture results in up to 21% reduction in energy consumption compared to the binary radix implementation at the same misclassification rate. Due to fault-tolerant nature of stochastic architectures, we also consider a quasi-synchronous implementation which yields 33% reduction in energy consumption w.r.t. the binary radix implementation without any compromise on performance.Comment: 11 pages, 12 figure

    Stochastic Simulated Quantum Annealing for Fast Solution of Combinatorial Optimization Problems

    Full text link
    In this paper, we introduce stochastic simulated quantum annealing (SSQA) for large-scale combinatorial optimization problems. SSQA is designed based on stochastic computing and quantum Monte Carlo, which can simulate quantum annealing (QA) by using multiple replicas of spins (probabilistic bits) in classical computing. The use of stochastic computing leads to an efficient parallel spin-state update algorithm, enabling quick search for a solution around the global minimum energy. Therefore, SSQA realizes quantum-like annealing for large-scale problems and can handle fully connected models in combinatorial optimization, unlike QA. The proposed method is evaluated in MATLAB on graph isomorphism problems, which are typical combinatorial optimization problems. The proposed method achieves a convergence speed an order of magnitude faster than a conventional stochastic simulaated annealing method. Additionally, it can handle a 100-times larger problem size compared to QA and a 25-times larger problem size compared to a traditional SA method, respectively, for similar convergence probabilities.Comment: 14 pages, 8 figure

    Algorithm and Architecture of Fully-Parallel Associative Memories Based on Sparse Clustered Networks

    No full text
    International audienceAssociative memories retrieve stored information given partial or erroneous input patterns. A new family of associative memories based on Sparse Clustered Networks (SCNs) has been recently introduced that can store many more messages than classical Hopfield-Neural Networks (HNNs). In this paper, we propose fully-parallel hardware architectures of such memories for partial or erroneous inputs. The proposed architectures eliminate winner-take-all modules and thus reduce the hardware complexity by consuming 65 % fewer FPGA lookup tables and increase the operating frequency by approximately 1.9 times compared to that of previous work. Furthermore, the scaling behaviour of the implemented architectures for various design choices are investigated. We explore the effect of varying design variables such as the number of clusters, network nodes, and erased symbols on the error performance and the hardware resources

    Algorithm and Architecture for a Low-Power Content-Addressable Memory Based on Sparse-Clustered Networks

    No full text
    International audienceWe propose a low-power content-addressable memory (CAM) employing a new algorithm for associativity between the input tag and the corresponding address of the output data. The proposed architecture is based on a recently developed sparse clustered network using binary connections that on-average eliminates most of the parallel comparisons performed during a search. Therefore, the dynamic energy consumption of the proposed design is significantly lower compared with that of a conventional low-power CAM design. Given an input tag, the proposed architecture computes a few possibilities for the location of the matched tag and performs the comparisons on them to locate a single valid match. TSMC 65-nm CMOS technology was used for simulation purposes. Following a selection of design parameters, such as the number of CAM entries, the energy consumption and the search delay of the proposed design are 8%, and 26% of that of the conventional NAND architecture, respectively, with a 10% area overhead. A design methodology based on the silicon area and power budgets, and performance requirements is discussed

    Algorithm and Architecture of Fully-Parallel Associative Memories Based on Sparse Clustered Networks

    No full text
    International audienceAssociative memories retrieve stored information given partial or erroneous input patterns. A new family of associative memories based on Sparse Clustered Networks (SCNs) has been recently introduced that can store many more messages than classical Hopfield-Neural Networks (HNNs). In this paper, we propose fully-parallel hardware architectures of such memories for partial or erroneous inputs. The proposed architectures eliminate winner-take-all modules and thus reduce the hardware complexity by consuming 65 % fewer FPGA lookup tables and increase the operating frequency by approximately 1.9 times compared to that of previous work. Furthermore, the scaling behaviour of the implemented architectures for various design choices are investigated. We explore the effect of varying design variables such as the number of clusters, network nodes, and erased symbols on the error performance and the hardware resources
    corecore