110 research outputs found

    Error-triggered Three-Factor Learning Dynamics for Crossbar Arrays

    Get PDF
    Recent breakthroughs suggest that local, approximate gradient descent learning is compatible with Spiking Neural Networks (SNNs). Although SNNs can be scalably implemented using neuromorphic VLSI, an architecture that can learn in-situ as accurately as conventional processors is still missing. Here, we propose a subthreshold circuit architecture designed through insights obtained from machine learning and computational neuroscience that could achieve such accuracy. Using a surrogate gradient learning framework, we derive local, error-triggered learning dynamics compatible with crossbar arrays and the temporal dynamics of SNNs. The derivation reveals that circuits used for inference and training dynamics can be shared, which simplifies the circuit and suppresses the effects of fabrication mismatch. We present SPICE simulations on XFAB 180nm process, as well as large-scale simulations of the spiking neural networks on event-based benchmarks, including a gesture recognition task. Our results show that the number of updates can be reduced hundred-fold compared to the standard rule while achieving performances that are on par with the state-of-the-art

    ALL-MASK: A Reconfigurable Logic Locking Method for Multicore Architecture with Sequential-Instruction-Oriented Key

    Full text link
    Intellectual property (IP) piracy has become a non-negligible problem as the integrated circuit (IC) production supply chain is becoming increasingly globalized and separated that enables attacks by potentially untrusted attackers. Logic locking is a widely adopted method to lock the circuit module with a key and prevent hackers from cracking it. The key is the critical aspect of logic locking, but the existing works have overlooked three possible challenges of the key: safety of key storage, easy key-attempt from interface and key-related overheads, bringing the further challenges of low error rate and small state space. In this work, the key is dynamically generated by utilizing the huge space of a CPU core, and the unlocking is performed implicitly through the interconnection inside the chip. A novel low-cost logic reconfigurable gate is together proposed with ferroelectric FET (FeFET) to mitigate the reverse engineering and removal attack. Compared to the common logic locking methods, our proposed approach is 19,945 times more time consuming to traverse all the possible combinations in only 9-bit-key condition. Furthermore, our technique let key length increases this complexity exponentially and ensure the logic obfuscation effect.Comment: 15 pages, 17 figure

    Romanus: Robust Task Offloading in Modular Multi-Sensor Autonomous Driving Systems

    Full text link
    Due to the high performance and safety requirements of self-driving applications, the complexity of modern autonomous driving systems (ADS) has been growing, instigating the need for more sophisticated hardware which could add to the energy footprint of the ADS platform. Addressing this, edge computing is poised to encompass self-driving applications, enabling the compute-intensive autonomy-related tasks to be offloaded for processing at compute-capable edge servers. Nonetheless, the intricate hardware architecture of ADS platforms, in addition to the stringent robustness demands, set forth complications for task offloading which are unique to autonomous driving. Hence, we present ROMANUSROMANUS, a methodology for robust and efficient task offloading for modular ADS platforms with multi-sensor processing pipelines. Our methodology entails two phases: (i) the introduction of efficient offloading points along the execution path of the involved deep learning models, and (ii) the implementation of a runtime solution based on Deep Reinforcement Learning to adapt the operating mode according to variations in the perceived road scene complexity, network connectivity, and server load. Experiments on the object detection use case demonstrated that our approach is 14.99% more energy-efficient than pure local execution while achieving a 77.06% reduction in risky behavior from a robust-agnostic offloading baseline.Comment: This paper has been accepted to the 2022 International Conference On Computer-Aided Design (ICCAD 2022

    Creation and detection of hardware trojans using non-invasive off-the-shelf technologies

    Get PDF
    As a result of the globalisation of the semiconductor design and fabrication processes, integrated circuits are becoming increasingly vulnerable to malicious attacks. The most concerning threats are hardware trojans. A hardware trojan is a malicious inclusion or alteration to the existing design of an integrated circuit, with the possible effects ranging from leakage of sensitive information to the complete destruction of the integrated circuit itself. While the majority of existing detection schemes focus on test-time, they all require expensive methodologies to detect hardware trojans. Off-the-shelf approaches have often been overlooked due to limited hardware resources and detection accuracy. With the advances in technologies and the democratisation of open-source hardware, however, these tools enable the detection of hardware trojans at reduced costs during or after production. In this manuscript, a hardware trojan is created and emulated on a consumer FPGA board. The experiments to detect the trojan in a dormant and active state are made using off-the-shelf technologies taking advantage of different techniques such as Power Analysis Reports, Side Channel Analysis and Thermal Measurements. Furthermore, multiple attempts to detect the trojan are demonstrated and benchmarked. Our simulations result in a state-of-the-art methodology to accurately detect the trojan in both dormant and active states using off-the-shelf hardware

    Circuit topology and synthesis flow co-design for the development of computational ReRAM

    Get PDF
    © 2022 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes,creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.Emerging memory technologies will play a decisive role in the quest for more energy-efficient computing systems. Computational ReRAM structures based on resistive switching devices (memristors) have been explored for in-memory computations using the resistance of ReRAM cells for storage and for logic I/O representation. Such approach presents three major challenges: the support for a memristor-oriented logic style, the ad-hoc design of memory array driving circuitry for memory and logic operations, and the development of dedicated synthesis tools to instruct the multi-level operations required for the execution of an arbitrary logic function in memory. This work contributes towards the development of an automated design flow for ReRAM-based computational memories, highlighting some important HW-SW co-design considerations. We briefly present a case study concerning a synthesis flow for a nonstateful logic style and the co-design of the underlying 1T1R crossbar array driving circuit. The prototype of the synthesis flow is based on the ABC tool and the Z3 solver. It executes fast owing to the level-by-level mapping of logic gates. Moreover, it delivers a mapping that minimizes the logic function latency through parallel logic operations, while also using the less possible ReRAM cells.Supported by Synopsys, Chile, by the Chilean grants FONDECYT Regular 1221747 and ANID-Basal FB0008, and by the Spanish MCIN/AEI/10.13039/501100011033 grant PID2019-103869RB-C33Peer ReviewedPostprint (author's final draft

    Adaptive Knobs for Resource Efficient Computing

    Get PDF
    Performance demands of emerging domains such as artificial intelligence, machine learning and vision, Internet-of-things etc., continue to grow. Meeting such requirements on modern multi/many core systems with higher power densities, fixed power and energy budgets, and thermal constraints exacerbates the run-time management challenge. This leaves an open problem on extracting the required performance within the power and energy limits, while also ensuring thermal safety. Existing architectural solutions including asymmetric and heterogeneous cores and custom acceleration improve performance-per-watt in specific design time and static scenarios. However, satisfying applications’ performance requirements under dynamic and unknown workload scenarios subject to varying system dynamics of power, temperature and energy requires intelligent run-time management. Adaptive strategies are necessary for maximizing resource efficiency, considering i) diverse requirements and characteristics of concurrent applications, ii) dynamic workload variation, iii) core-level heterogeneity and iv) power, thermal and energy constraints. This dissertation proposes such adaptive techniques for efficient run-time resource management to maximize performance within fixed budgets under unknown and dynamic workload scenarios. Resource management strategies proposed in this dissertation comprehensively consider application and workload characteristics and variable effect of power actuation on performance for pro-active and appropriate allocation decisions. Specific contributions include i) run-time mapping approach to improve power budgets for higher throughput, ii) thermal aware performance boosting for efficient utilization of power budget and higher performance, iii) approximation as a run-time knob exploiting accuracy performance trade-offs for maximizing performance under power caps at minimal loss of accuracy and iv) co-ordinated approximation for heterogeneous systems through joint actuation of dynamic approximation and power knobs for performance guarantees with minimal power consumption. The approaches presented in this dissertation focus on adapting existing mapping techniques, performance boosting strategies, software and dynamic approximations to meet the performance requirements, simultaneously considering system constraints. The proposed strategies are compared against relevant state-of-the-art run-time management frameworks to qualitatively evaluate their efficacy

    Understanding Timing Error Characteristics From Overclocked Systolic Multiply–Accumulate Arrays in FPGAs

    Get PDF
    Artificial Intelligence (AI) hardware accelerators have seen tremendous developments in recent years due to the rapid growth of AI in multiple fields. Many such accelerators comprise a Systolic Multiply–Accumulate Array (SMA) as its computational brain. In this paper, we investigate the faulty output characterization of an SMA in a real silicon FPGA board. Experiments were run on a single Zybo Z7-20 board to control for process variation at nominal voltage and in small batches to control for temperature. The FPGA is rated up to 800 MHz in the data sheet due to the max frequency of the PLL, but the design is written using Verilog for the FPGA and C++ for the processor and synthesized with a chosen constraint of a 125 MHz clock. We then operate the system at a frequency range of 125 MHz to 450 MHz for the FPGA and the nominal 667 MHz for the processor core to produce timing errors in the FPGA without affecting the processor. Our extensive experimental platform with a hardware–software ecosystem provides a methodological pathway that reveals fascinating characteristics of SMA behavior under an overclocked environment. While one may intuitively expect that timing errors resulting from overclocked hardware may produce a wide variation in output values, our post-silicon evaluation reveals a lack of variation in erroneous output values. We found an intriguing pattern where error output values are stable for a given input across a range of operating frequencies far exceeding the rated frequency of the FPGA
    • …
    corecore