14 research outputs found

    Selective Decoding in Associative Memories Based on Sparse-Clustered Networks

    Full text link
    Associative memories are structures that can retrieve previously stored information given a partial input pattern instead of an explicit address as in indexed memories. A few hardware approaches have recently been introduced for a new family of associative memories based on Sparse-Clustered Networks (SCN) that show attractive features. These architectures are suitable for implementations with low retrieval latency, but are limited to small networks that store a few hundred data entries. In this paper, a new hardware architecture of SCNs is proposed that features a new data-storage technique as well as a method we refer to as Selective Decoding (SD-SCN). The SD-SCN has been implemented using a similar FPGA used in the previous efforts and achieves two orders of magnitude higher capacity, with no error-performance penalty but with the cost of few extra clock cycles per data access.Comment: 4 pages, Accepted in IEEE Global SIP 2013 conferenc

    Development of an area-efficient and low-power five-transistor SRAM for low-power SoC

    Get PDF
    The purpose of this thesis is to introduce a new low-power, reliable and high-performance five-transistor (5T) SRAM in 65nm CMOS technology, which can be used for cache memory in processors and low-power portable devices. An area reduction of ~13% compared to a conventional 6T cell is possible. A biasing ground line is charged by channel leakage current from memory cells in standby, and is used to pre-charge a single bit-line and bias the negative supply voltage of each memory cell to suppress standby leakage power. A major standby power reduction is gained compared to conventional 5T and 6T designs, and up to ~30% compared to previous low-power 6T designs. Read, write, and standby performance and reliability issues are discussed and compared with conventional and low-power 6T SRAM designs

    VLSI implementation of associative memories based on sparse clustered networks

    No full text
    Associative memories retrieve stored information given partial input patterns without requiring an explicit address as in indexed memories. A new family of associative memories based on Sparse Clustered Networks (SCNs) has been recently introduced that can store many more messages than classical Hopfield Neural Networks (HNNs). Hardware implementation of such memories benefits applications, where fast and parallel lookup operations are required, such as cache memory in processors, database search engines, network routers, pattern recognition in image processing, and even virus intrusion detection in data centres.In this dissertation, we first present a proof-of-concept hardware implementation of SCNs, and analyze its limitations. Two variants of a reduced-complexity fully-parallel hardware architecture are then introduced that eliminate resource-hungry winner-take-all modules and adders on the critical path, and reduce hardware complexity by consuming 65% fewer FPGA lookup tables and increase the operating frequency by approximately 1.9 times. We also explore the effect of varying design variables such as the number of clusters, network nodes, and erased symbols on the retrieval quality and the hardware resources.The fully-parallel architectures are suitable for applications requiring critically-low retrieval latencies, but do not require more than a few hundred entries. We present a new hardware architecture that features a new data-storage technique as well as a decoding method we refer to as Selective Decoding (SD-SCN). SD-SCN has been implemented in two variants: a fast architecture using on-chip RAM modules, and a slower but larger memory architecture using external memory to store associations. SD-SCN results in orders of magnitude improvement in capacity and diversity without any error-performance penalty.We propose a low-power Content-Addressable Memory (CAM) employing a new algorithm for associativity between the input tag and the corresponding address of the output data. The SCN-based architecture, on average, eliminates most of the parallel comparisons performed during a search. TSMC 65 nm CMOS technology was used for simulation purposes. After optimization of design parameters such as the number of CAM entries, the energy consumption and the search delay of the proposed design are 8%, and 26% of that of the conventional NAND architecture respectively with a 10% area overhead. We compare our results with those of the latest works in literature. A design methodology based on the silicon-area, power budget, and performance requirements is presented.We present algorithm, architecture, and fabrication results of a non-volatile context-driven search engine that reduces energy consumption as well as computational delay compared to classical hardware and software-based approaches. The fabricated hardware achieves 13.6x memory reduction and 89% energy saving compared to a classical field-based approach in hardware, based on content-addressable memory (CAM). Furthermore, it achieves 8.6x reduced number of clock cycles in performing search operations compared to the CAM, and five orders of magnitude reduced number of clock cycles compared to a fabricated and measured ultra low- power CPU-based counterpart running a classical search algorithm in software. The energy consumption of the proposed architecture is on average three orders of magnitude smaller than that of a software-based approach. A magnetic tunnel junction (MTJ)-based logic-in-memory architecture is presented that allows simple routing and eliminates leakage current in standby using 90 nm CMOS/MTJ-hybrid technologies.Les mémoires associatives sont capables de stocker de l'information puis d'y accéder à partir d'une fraction de son contenu. Elles se distinguent des mémoires indexées pour lesquelles une adresse explicite est requise. Une nouvelle famille de mémoires associatives s'appuyant sur des Réseaux de Neurones Parcimonieux Compartimentés (RNPC) a été récemment introduite. Ces réseaux peuvent stocker bien plus de messages que les célèbres réseaux de Hopfield. Des implémentations numériques de ces mémoires sont bénéfiques pour les applications nécessitant une recherche rapide et parallèle, par exemple les caches des processeurs, les moteurs de recherche de bases de données, les routeurs, la reconnaissance d'images, et même la détection de virus dans les systèmes de données.Dans cette thèse, nous présentons d'abord une preuve de concept d'implémentation numérique des RNPC, et discutons de ses limitations. Deux variantes à complexité réduite et entièrement parallèles sont ensuite introduites. Elles éliminent le coûteux "gagnant-prend-tout" et des additionneurs sur le chemin critique, menant à une consommation 65% moins importante des tables de correspondances FPGA et à une fréquence de travail approximativement 1.9 fois plus élevée.Nous nous intéressons également à l'impact de certains paramètres de ces réseaux, dont le nombre de parties, le nombre de nœuds et le nombre de caractères effacés sur les performances et la consommation matérielle.Les architectures entièrement parallèles sont adaptées aux applications demandant de très faibles temps de réponse et un faible nombre d'entrées. Nous présentons une nouvelle architecture de structure de données ainsi qu'une technique de décodage que nous appelons "décodage sélectif" (DS-RNPC). Cette technique a été implémentée en deux variantes : une première architecture rapide utilisant des modules RAM sur puce, et une plus lente mais avec plus de capacité utilisant une mémoire externe pour les associations. La technique DS-RNPC génère une amélioration de plusieurs ordres de grandeur de la capacité et de la diversité, et ce sans augmenter le taux d'erreur.Nous proposons une mémoire adressable par le contenu à faible consommation énergétique qui utilise un nouvel algorithme pour associer les clés d'entrée aux adresses de sortie correspondantes. Cette architecture s'appuyant sur les RNPC élimine, en moyenne, la plupart des comparaisons parallèles requises pour répondre à une requête.La technologie 65nm TSMC CMOS a été utilisée pour les simulations. Après optimisation des paramètres tels que le nombre d'entrées, la consommation énergétique et le temps de réponse sont respectivement 8% et 26% de ceux d'une structure conventionnelle NAND pour une occupation matérielle 10% plus importante. Nous comparons nos résultats avec ceux de l'état de l'art. Une méthodologie pour paramètrer les réseaux en fonction de la surface siliconée, de l'énergie disponible et des performances visées est présentée.La puce que nous avons fondue réduit par un facteur de 13.6 la mémoire et de 89% l'énergie requise par rapport à une approche classique utilisant une mémoire adressable par le contenu. De plus, elle réduit d'un facteur de 8.6 le nombre de cycles d'horloge pour répondre à une requête. Comparée à un processeur ultra basse consommation, le gain est de cinq ordres de grandeur. La consommation énergétique de cette architecture est en moyenne de trois ordres de grandeur plus petite que celle d'une approche logicielle. Une implémentation logic-in-memory utilisant des Magnetic-Tunnel-Junction (MTJ) est présentée et permet un routage simple et une élimination des courants de fuite avec une technologie 90nm CMOS/MTJ-Hybrid

    Algorithm and Architecture of Fully-Parallel Associative Memories Based on Sparse Clustered Networks

    No full text
    International audienceAssociative memories retrieve stored information given partial or erroneous input patterns. A new family of associative memories based on Sparse Clustered Networks (SCNs) has been recently introduced that can store many more messages than classical Hopfield-Neural Networks (HNNs). In this paper, we propose fully-parallel hardware architectures of such memories for partial or erroneous inputs. The proposed architectures eliminate winner-take-all modules and thus reduce the hardware complexity by consuming 65 % fewer FPGA lookup tables and increase the operating frequency by approximately 1.9 times compared to that of previous work. Furthermore, the scaling behaviour of the implemented architectures for various design choices are investigated. We explore the effect of varying design variables such as the number of clusters, network nodes, and erased symbols on the error performance and the hardware resources

    Algorithm and Architecture for a Low-Power Content-Addressable Memory Based on Sparse-Clustered Networks

    No full text
    International audienceWe propose a low-power content-addressable memory (CAM) employing a new algorithm for associativity between the input tag and the corresponding address of the output data. The proposed architecture is based on a recently developed sparse clustered network using binary connections that on-average eliminates most of the parallel comparisons performed during a search. Therefore, the dynamic energy consumption of the proposed design is significantly lower compared with that of a conventional low-power CAM design. Given an input tag, the proposed architecture computes a few possibilities for the location of the matched tag and performs the comparisons on them to locate a single valid match. TSMC 65-nm CMOS technology was used for simulation purposes. Following a selection of design parameters, such as the number of CAM entries, the energy consumption and the search delay of the proposed design are 8%, and 26% of that of the conventional NAND architecture, respectively, with a 10% area overhead. A design methodology based on the silicon area and power budgets, and performance requirements is discussed

    Algorithm and Architecture of Fully-Parallel Associative Memories Based on Sparse Clustered Networks

    No full text
    International audienceAssociative memories retrieve stored information given partial or erroneous input patterns. A new family of associative memories based on Sparse Clustered Networks (SCNs) has been recently introduced that can store many more messages than classical Hopfield-Neural Networks (HNNs). In this paper, we propose fully-parallel hardware architectures of such memories for partial or erroneous inputs. The proposed architectures eliminate winner-take-all modules and thus reduce the hardware complexity by consuming 65 % fewer FPGA lookup tables and increase the operating frequency by approximately 1.9 times compared to that of previous work. Furthermore, the scaling behaviour of the implemented architectures for various design choices are investigated. We explore the effect of varying design variables such as the number of clusters, network nodes, and erased symbols on the error performance and the hardware resources

    Algorithm and architecture for a multiple-field context-driven search engine using fully-parallel clustered associative memories

    No full text
    International audienceIn this paper, a context-driven search engine is presented based on a new family of associative memories. It stores only the associations between items from multiple search fields in the form of binary links, and merges repeated field items to reduce the memory requirements. It achieves 13.6Ă— reduction in memory bits and accesses, and 8.6Ă— reduced number of clock cycles in search operation compared to a classical field-based search structure using content-addressable memory. Furthermore, using parallel computational nodes in the proposed search engine, it achieves five orders of magnitude reduced number of clock cycles compared to a CPU-based counterpart running a classical search algorithm in software

    UFind [wireless car finding system]

    No full text
    Weiibo Inc. presents UFind, a product that can be installed in cars either as a built-in feature by car manufacturers, integrated in remote car control systems, or as a product people can purchase off shelves at warehouses or electronics stores. This device will be used to inform the drivers whether they are getting closer to their car or farther away. Our product will have two major parts: the car module, which needs to be put inside the car, and the driver display module, which needs to be carried by the driver. As an extra feature, this device can store information about parking area, such as lot number or parking floor number. When needed, the driver can view the information for more guidance.&nbsp

    Reduced-complexity binary-weight-coded associative memories

    No full text
    International audienceAssociative memories retrieve stored information given partial or erroneous input patterns. Recently, a new family of associative memories based on Clustered-Neural-Networks (CNNs) was introduced that can store many more messages than classical Hopfield-Neural Networks (HNNs). In this paper, we propose hardware architectures of such memories for partial or erroneous inputs. The proposed architectures eliminate winner-take-all modules and thus reduce the hardware complexity by consuming 65% fewer FPGA lookup tables and increase the operating frequency by approximately 1.9 times compared to that of previous work
    corecore