42 research outputs found

    FPSA: A Full System Stack Solution for Reconfigurable ReRAM-based NN Accelerator Architecture

    Full text link
    Neural Network (NN) accelerators with emerging ReRAM (resistive random access memory) technologies have been investigated as one of the promising solutions to address the \textit{memory wall} challenge, due to the unique capability of \textit{processing-in-memory} within ReRAM-crossbar-based processing elements (PEs). However, the high efficiency and high density advantages of ReRAM have not been fully utilized due to the huge communication demands among PEs and the overhead of peripheral circuits. In this paper, we propose a full system stack solution, composed of a reconfigurable architecture design, Field Programmable Synapse Array (FPSA) and its software system including neural synthesizer, temporal-to-spatial mapper, and placement & routing. We highly leverage the software system to make the hardware design compact and efficient. To satisfy the high-performance communication demand, we optimize it with a reconfigurable routing architecture and the placement & routing tool. To improve the computational density, we greatly simplify the PE circuit with the spiking schema and then adopt neural synthesizer to enable the high density computation-resources to support different kinds of NN operations. In addition, we provide spiking memory blocks (SMBs) and configurable logic blocks (CLBs) in hardware and leverage the temporal-to-spatial mapper to utilize them to balance the storage and computation requirements of NN. Owing to the end-to-end software system, we can efficiently deploy existing deep neural networks to FPSA. Evaluations show that, compared to one of state-of-the-art ReRAM-based NN accelerators, PRIME, the computational density of FPSA improves by 31x; for representative NNs, its inference performance can achieve up to 1000x speedup.Comment: Accepted by ASPLOS 201

    On Temperature Dependency of Steep Subthreshold Slope in Dual-Independent-Gate FinFET

    Get PDF
    Dual-independent-gate silicon FinFET has demonstrated a steep subthreshold slope (SS) when a positive feedback induced by weak impact ionization is triggered. In this paper, we study the temperature dependency of the steep SS by characterizing the fabricated device from 100 to 380 K. The measured characteristics of SS show a reduced sensitivity to temperature as compared to conventional MOSFETs. Based on the temperature-dependent characterization, we further analyze the steep-SS characteristics and propose feasible improvements for optimizing the device performance

    Data processing and information classification— an in-memory approach

    Get PDF
    9noTo live in the information society means to be surrounded by billions of electronic devices full of sensors that constantly acquire data. This enormous amount of data must be processed and classified. A solution commonly adopted is to send these data to server farms to be remotely elaborated. The drawback is a huge battery drain due to high amount of information that must be exchanged. To compensate this problem data must be processed locally, near the sensor itself. But this solution requires huge computational capabilities. While microprocessors, even mobile ones, nowadays have enough computational power, their performance are severely limited by the Memory Wall problem. Memories are too slow, so microprocessors cannot fetch enough data from them, greatly limiting their performance. A solution is the Processing-In-Memory (PIM) approach. New memories are designed that can elaborate data inside them eliminating the Memory Wall problem. In this work we present an example of such a system, using as a case of study the Bitmap Indexing algorithm. Such algorithm is used to classify data coming from many sources in parallel. We propose a hardware accelerator designed around the Processing-In-Memory approach, that is capable of implementing this algorithm and that can also be reconfigured to do other tasks or to work as standard memory. The architecture has been synthesized using CMOS technology. The results that we have obtained highlights that, not only it is possible to process and classify huge amount of data locally, but also that it is possible to obtain this result with a very low power consumption.openopenAndrighetti, M. .; Turvani, G.; Santoro, G.; Vacca, M.; Marchesin, A.; Ottati, F.; Roch, M.R.; Graziano, M.; Zamboni, M.Andrighetti, M.; Turvani, G.; Santoro, G.; Vacca, M.; Marchesin, A.; Ottati, F.; Roch, M. R.; Graziano, M.; Zamboni, M

    Configurations of memristor-based APUF for improved performance

    Get PDF
    The memristor-based arbiter PUF (APUF) has great potential to be used for hardware security purposes. Its advantage is in its challenge-dependent delays, which cannot be modeled by machine learning algorithms. In this paper, further improvement is proposed, which are circuit configurations to the memristor-based APUF. Two configuration aspects were introduced namely varying the number of memristor per transistor, and the number of challenge and response bits. The purpose of the configurations is to introduce additional variation to the PUF, thereby improve PUF performance in terms of uniqueness, uniformity, and bit-aliasing; as well as resistance against support vector machine (SVM). Monte Carlo simulations were carried out on 180 nm and 130 nm, where both CMOS technologies have produced uniqueness, uniformity, and bit-aliasing values close to the ideal 50%; as well as SVM prediction accuracies no higher than 52.3%, therefore indicating excellent PUF performance

    FPGA-SPICE: A Simulation-Based Architecture Evaluation Framework for FPGAs

    Get PDF
    In this paper, we developed a simulation-based architecture evaluation framework for field-programmable gate arrays (FPGAs), called FPGA-SPICE, which enables automatic layout-level estimation and electrical simulations of FPGA architectures. FPGA-SPICE can automatically generate Verilog and SPICE netlists based on realistic FPGA configurations and a high-level eTtensible Markup Language-based FPGA architectural description language. The outputted Verilog netlists can be used to generate layouts of full FPGA fabrics through a semicustom design flow. SPICE simulation decks can be generated at three levels of complexity, namely, full-chip-level, grid-level, and component-level, providing different tradeoff between accuracy and simulation time. In order to enable such level of analysis, we presented two SPICE netlist partitioning techniques: loads extraction and parasitic net activity estimation. Electrical simulations showed that averaged over the selected benchmarks, the grid-/component-level approach can achieve 6.1x/7.5x execution speed-up with 9.9%/8.3% accuracy loss, respectively, compared to the full-chip level simulation. FPGA-SPICE was showcased through three different case studies: 1) an area breakdown analysis for static random access memory-based FPGAs, showing that configuration memories are a dominant factor; 2) a power breakdown comparison to analytical models, analyzing the source of accuracy loss; and 3) a robustness evaluation against process corners, studying their impact on energy consumption of full FPGA fabrics

    ALL-MASK: A Reconfigurable Logic Locking Method for Multicore Architecture with Sequential-Instruction-Oriented Key

    Full text link
    Intellectual property (IP) piracy has become a non-negligible problem as the integrated circuit (IC) production supply chain is becoming increasingly globalized and separated that enables attacks by potentially untrusted attackers. Logic locking is a widely adopted method to lock the circuit module with a key and prevent hackers from cracking it. The key is the critical aspect of logic locking, but the existing works have overlooked three possible challenges of the key: safety of key storage, easy key-attempt from interface and key-related overheads, bringing the further challenges of low error rate and small state space. In this work, the key is dynamically generated by utilizing the huge space of a CPU core, and the unlocking is performed implicitly through the interconnection inside the chip. A novel low-cost logic reconfigurable gate is together proposed with ferroelectric FET (FeFET) to mitigate the reverse engineering and removal attack. Compared to the common logic locking methods, our proposed approach is 19,945 times more time consuming to traverse all the possible combinations in only 9-bit-key condition. Furthermore, our technique let key length increases this complexity exponentially and ensure the logic obfuscation effect.Comment: 15 pages, 17 figure

    MRAM-Based FPGAs: A Survey

    Get PDF
    Over the last decade, field programmable gate arrays (FPGAs) have embraced heterogeneity in a transformative way by leveraging emerging memory devices along with conventional CMOS-based devices to realize technology-specific benefits. Memristive device technologies exhibit desirable characteristics such as non-volatility, scalability, near-zero leakage, radiation hardness, and more, making them promising alternatives for SRAM cells found in conventional SRAM-based FPGAs. In recent years, a significant amount of research has been performed to take advantage of these emerging technologies to develop fundamental building blocks of FPGAs like hybrid CMOS-memristive look-up tables (LUTs) and configurable logic blocks (CLBs). In this chapter, we will provide a brief overview of the previous work on hybrid CMOS-memristive FPGAs and their corresponding opportunities and challenges

    New Logic Synthesis As Nanotechnology Enabler (invited paper)

    Get PDF
    Nanoelectronics comprises a variety of devices whose electrical properties are more complex as compared to CMOS, thus enabling new computational paradigms. The potentially large space for innovation has to be explored in the search for technologies that can support large-scale and high- performance circuit design. Within this space, we analyze a set of emerging technologies characterized by a similar computational abstraction at the design level, i.e., a binary comparator or a majority voter. We demonstrate that new logic synthesis techniques, natively supporting this abstraction, are the technology enablers. We describe models and data-structures for logic design using emerging technologies and we show results of applying new synthesis algorithms and tools. We conclude that new logic synthesis methods are required to both evaluate emerging technologies and to achieve the best results in terms of area, power and performance

    Majority-based Synthesis for Nanotechnologies

    Get PDF
    We study the logic synthesis of emerging nanotechnologies whose elementary devices abstraction is a majority voter. We argue that synthesis tools, natively supporting the majority logic abstraction, are the technology enablers. This is because they allow designers to validate majority-based nanotechnologies on large-scale benchmarks. We describe models and data-structures for logic design with majority-based nanotechnologies and we show results of applying new synthesis algorithms and tools. We conclude that new logic synthesis methods are required to achieve a fair assessment on emerging nanotechnologies
    corecore