Search CORE

110 research outputs found

Error-triggered Three-Factor Learning Dynamics for Crossbar Arrays

Author: Eltawil Ahmed
Fouda Mohammed
Kurdahi Fadi
Neftci Emre O
Payvand Melika
Publication venue: eScholarship, University of California
Publication date: 14/10/2019
Field of study

Recent breakthroughs suggest that local, approximate gradient descent learning is compatible with Spiking Neural Networks (SNNs). Although SNNs can be scalably implemented using neuromorphic VLSI, an architecture that can learn in-situ as accurately as conventional processors is still missing. Here, we propose a subthreshold circuit architecture designed through insights obtained from machine learning and computational neuroscience that could achieve such accuracy. Using a surrogate gradient learning framework, we derive local, error-triggered learning dynamics compatible with crossbar arrays and the temporal dynamics of SNNs. The derivation reveals that circuits used for inference and training dynamics can be shared, which simplifies the circuit and suppresses the effects of fabrication mismatch. We present SPICE simulations on XFAB 180nm process, as well as large-scale simulations of the spiking neural networks on event-based benchmarks, including a gesture recognition task. Our results show that the number of updates can be reduced hundred-fold compared to the standard rule while achieving performances that are on par with the state-of-the-art

arXiv.org e-Print Archive

Crossref

eScholarship - University of California

ZORA

ALL-MASK: A Reconfigurable Logic Locking Method for Multicore Architecture with Sequential-Instruction-Oriented Key

Author: Chen Zhonghao
George Sumitha
Li Xueqing
Liu Yongpan
Narayanan Vijaykrishnan
Wang Jianfeng
Xu Yixin
Yang Huazhong
Ye Enze
Yu Tongguang
Zhang Jiahao
Zheng Ziheng
Publication venue
Publication date: 16/06/2022
Field of study

Intellectual property (IP) piracy has become a non-negligible problem as the integrated circuit (IC) production supply chain is becoming increasingly globalized and separated that enables attacks by potentially untrusted attackers. Logic locking is a widely adopted method to lock the circuit module with a key and prevent hackers from cracking it. The key is the critical aspect of logic locking, but the existing works have overlooked three possible challenges of the key: safety of key storage, easy key-attempt from interface and key-related overheads, bringing the further challenges of low error rate and small state space. In this work, the key is dynamically generated by utilizing the huge space of a CPU core, and the unlocking is performed implicitly through the interconnection inside the chip. A novel low-cost logic reconfigurable gate is together proposed with ferroelectric FET (FeFET) to mitigate the reverse engineering and removal attack. Compared to the common logic locking methods, our proposed approach is 19,945 times more time consuming to traverse all the possible combinations in only 9-bit-key condition. Furthermore, our technique let key length increases this complexity exponentially and ensure the logic obfuscation effect.Comment: 15 pages, 17 figure

arXiv.org e-Print Archive

Romanus: Robust Task Offloading in Modular Multi-Sensor Autonomous Driving Systems

Author: Chen Luke
Faruque Mohammad Abdullah Al
Odema Mohanad
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 18/07/2022
Field of study

Due to the high performance and safety requirements of self-driving applications, the complexity of modern autonomous driving systems (ADS) has been growing, instigating the need for more sophisticated hardware which could add to the energy footprint of the ADS platform. Addressing this, edge computing is poised to encompass self-driving applications, enabling the compute-intensive autonomy-related tasks to be offloaded for processing at compute-capable edge servers. Nonetheless, the intricate hardware architecture of ADS platforms, in addition to the stringent robustness demands, set forth complications for task offloading which are unique to autonomous driving. Hence, we present

ROMANUS

, a methodology for robust and efficient task offloading for modular ADS platforms with multi-sensor processing pipelines. Our methodology entails two phases: (i) the introduction of efficient offloading points along the execution path of the involved deep learning models, and (ii) the implementation of a runtime solution based on Deep Reinforcement Learning to adapt the operating mode according to variations in the perceived road scene complexity, network connectivity, and server load. Experiments on the object detection use case demonstrated that our approach is 14.99% more energy-efficient than pure local execution while achieving a 77.06% reduction in risky behavior from a robust-agnostic offloading baseline.Comment: This paper has been accepted to the 2022 International Conference On Computer-Aided Design (ICCAD 2022

arXiv.org e-Print Archive

Creation and detection of hardware trojans using non-invasive off-the-shelf technologies

Author: Bellekens Xavier
Rooney Catherine
Seeam Amar
Publication venue
Publication date: 01/07/2018
Field of study

As a result of the globalisation of the semiconductor design and fabrication processes, integrated circuits are becoming increasingly vulnerable to malicious attacks. The most concerning threats are hardware trojans. A hardware trojan is a malicious inclusion or alteration to the existing design of an integrated circuit, with the possible effects ranging from leakage of sensitive information to the complete destruction of the integrated circuit itself. While the majority of existing detection schemes focus on test-time, they all require expensive methodologies to detect hardware trojans. Off-the-shelf approaches have often been overlooked due to limited hardware resources and detection accuracy. With the advances in technologies and the democratisation of open-source hardware, however, these tools enable the detection of hardware trojans at reduced costs during or after production. In this manuscript, a hardware trojan is created and emulated on a consumer FPGA board. The experiments to detect the trojan in a dormant and active state are made using off-the-shelf technologies taking advantage of different techniques such as Power Analysis Reports, Side Channel Analysis and Thermal Measurements. Furthermore, multiple attempts to detect the trojan are demonstrated and benchmarked. Our simulations result in a state-of-the-art methodology to accurately detect the trojan in both dormant and active states using off-the-shelf hardware

Abertay Research Portal

University of Strathclyde Institutional Repository

Directory of Open Access Journals

Middlesex University Research Repository

Circuit topology and synthesis flow co-design for the development of computational ReRAM

Author: Fernandez Hernandez Carlos
Rubio Sola Jose Antonio
Vourkas Ioannis
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2022
Field of study

© 2022 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes,creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.Emerging memory technologies will play a decisive role in the quest for more energy-efficient computing systems. Computational ReRAM structures based on resistive switching devices (memristors) have been explored for in-memory computations using the resistance of ReRAM cells for storage and for logic I/O representation. Such approach presents three major challenges: the support for a memristor-oriented logic style, the ad-hoc design of memory array driving circuitry for memory and logic operations, and the development of dedicated synthesis tools to instruct the multi-level operations required for the execution of an arbitrary logic function in memory. This work contributes towards the development of an automated design flow for ReRAM-based computational memories, highlighting some important HW-SW co-design considerations. We briefly present a case study concerning a synthesis flow for a nonstateful logic style and the co-design of the underlying 1T1R crossbar array driving circuit. The prototype of the synthesis flow is based on the ABC tool and the Z3 solver. It executes fast owing to the level-by-level mapping of logic gates. Moreover, it delivers a mapping that minimizes the logic function latency through parallel logic operations, while also using the less possible ReRAM cells.Supported by Synopsys, Chile, by the Chilean grants FONDECYT Regular 1221747 and ANID-Basal FB0008, and by the Spanish MCIN/AEI/10.13039/501100011033 grant PID2019-103869RB-C33Peer ReviewedPostprint (author's final draft

UPCommons. Portal del coneixement obert de la UPC

Recommended from our members

Resource-Aware Predictive Models in Cyber-Physical Systems

Author: AMIR MARAL
Publication venue: eScholarship, University of California
Publication date: 01/01/2019
Field of study

Cyber-Physical Systems (CPS) are composed of computing devices interacting with physical systems. Model-based design is a powerful methodology in CPS design in the implementation of control systems. For instance, Model Predictive Control (MPC) is typically implemented in CPS applications, e.g., in path tracking of autonomous vehicles. MPC deploys a model to estimate the behavior of the physical system at future time instants for a specific time horizon. Ordinary Differential Equations (ODE) are the most commonly used models to emulate the behavior of continuous-time (non-)linear dynamical systems. A complex physical model may comprise thousands of ODEs that pose scalability, performance and power consumption challenges. One approach to address these model complexity challenges are frameworks that automate the development of model-to-model transformation. In this dissertation, a state-based model with tunable parameters is proposed to operate as a reconfigurable predictive model of the physical system. Moreover, we propose a run-time switching algorithm that selects the best model using machine learning. We employed a metric that formulates the trade-off between the error and computational savings due to model reduction. Building statistical models are constrained to having expert knowledge and an actual understanding of the modeled phenomenon or process. Also, statistical models may not produce solutions that are as robust in a real-world context as factors outside the model, like disruptions would not be taken into account. Machine learning models have emerged as a solution to account for the dynamic behavior of the environment and automate intelligence acquisition and refinement. Neural networks are machine learning models, well-known to have the ability to learn linear and nonlinear relations between input and output variables without prior knowledge. However, the ability to efficiently exploit resource-hungry neural networks in embedded resource-bound settings is a major challenge.Here, we proposed Priority Neuron Network (PNN), a resource-aware neural networks model that can be reconfigured into smaller sub-networks at runtime. This approach enables a trade-off between the model's computation time and accuracy based on available resources. The PNN model is memory efficient since it stores only one set of parameters to account for various sub-network sizes. We propose a training algorithm that applies regularization techniques to constrain the activation value of neurons and assigns a priority to each one. We consider the neuron's ordinal number as our priority criteria in that the priority of the neuron is inversely proportional to its ordinal number in the layer. This imposes a relatively sorted order on the activation values. We conduct experiments to employ our PNN as the predictive model in a CPS application. We can see that not only our technique will resolve the memory overhead of DNN architectures but it also reduces the computation overhead for the training process substantially. The training time is a critical matter especially in embedded systems where many NN models are trained on the fly

eScholarship - University of California

Adaptive Knobs for Resource Efficient Computing

Author: Kanduri Anil
Publication venue: fi=Turku Centre for Computer Science|en=Turku Centre for Computer Science|
Publication date: 13/12/2018
Field of study

Performance demands of emerging domains such as artificial intelligence, machine learning and vision, Internet-of-things etc., continue to grow. Meeting such requirements on modern multi/many core systems with higher power densities, fixed power and energy budgets, and thermal constraints exacerbates the run-time management challenge. This leaves an open problem on extracting the required performance within the power and energy limits, while also ensuring thermal safety. Existing architectural solutions including asymmetric and heterogeneous cores and custom acceleration improve performance-per-watt in specific design time and static scenarios. However, satisfying applications’ performance requirements under dynamic and unknown workload scenarios subject to varying system dynamics of power, temperature and energy requires intelligent run-time management. Adaptive strategies are necessary for maximizing resource efficiency, considering i) diverse requirements and characteristics of concurrent applications, ii) dynamic workload variation, iii) core-level heterogeneity and iv) power, thermal and energy constraints. This dissertation proposes such adaptive techniques for efficient run-time resource management to maximize performance within fixed budgets under unknown and dynamic workload scenarios. Resource management strategies proposed in this dissertation comprehensively consider application and workload characteristics and variable effect of power actuation on performance for pro-active and appropriate allocation decisions. Specific contributions include i) run-time mapping approach to improve power budgets for higher throughput, ii) thermal aware performance boosting for efficient utilization of power budget and higher performance, iii) approximation as a run-time knob exploiting accuracy performance trade-offs for maximizing performance under power caps at minimal loss of accuracy and iv) co-ordinated approximation for heterogeneous systems through joint actuation of dynamic approximation and power knobs for performance guarantees with minimal power consumption. The approaches presented in this dissertation focus on adapting existing mapping techniques, performance boosting strategies, software and dynamic approximations to meet the performance requirements, simultaneously considering system constraints. The proposed strategies are compared against relevant state-of-the-art run-time management frameworks to qualitatively evaluate their efficacy

UTUPub

Understanding Timing Error Characteristics From Overclocked Systolic Multiply–Accumulate Arrays in FPGAs

Author: Chakraborty Koushik
Chamberlin Andrew
Gerber Andrew
Goodale Tim
Gundi Noel Daniel
Palmer Mason
Roy Sanghamitra
Publication venue: Hosted by Utah State University Libraries
Publication date: 09/01/2024
Field of study

Artificial Intelligence (AI) hardware accelerators have seen tremendous developments in recent years due to the rapid growth of AI in multiple fields. Many such accelerators comprise a Systolic Multiply–Accumulate Array (SMA) as its computational brain. In this paper, we investigate the faulty output characterization of an SMA in a real silicon FPGA board. Experiments were run on a single Zybo Z7-20 board to control for process variation at nominal voltage and in small batches to control for temperature. The FPGA is rated up to 800 MHz in the data sheet due to the max frequency of the PLL, but the design is written using Verilog for the FPGA and C++ for the processor and synthesized with a chosen constraint of a 125 MHz clock. We then operate the system at a frequency range of 125 MHz to 450 MHz for the FPGA and the nominal 667 MHz for the processor core to produce timing errors in the FPGA without affecting the processor. Our extensive experimental platform with a hardware–software ecosystem provides a methodological pathway that reveals fascinating characteristics of SMA behavior under an overclocked environment. While one may intuitively expect that timing errors resulting from overclocked hardware may produce a wide variation in output values, our post-silicon evaluation reveals a lack of variation in erroneous output values. We found an intriguing pattern where error output values are stable for a given input across a range of operating frequencies far exceeding the rated frequency of the FPGA

DigitalCommons@USU