110 research outputs found
Error-triggered Three-Factor Learning Dynamics for Crossbar Arrays
Recent breakthroughs suggest that local, approximate gradient descent
learning is compatible with Spiking Neural Networks (SNNs). Although SNNs can
be scalably implemented using neuromorphic VLSI, an architecture that can learn
in-situ as accurately as conventional processors is still missing. Here, we
propose a subthreshold circuit architecture designed through insights obtained
from machine learning and computational neuroscience that could achieve such
accuracy. Using a surrogate gradient learning framework, we derive local,
error-triggered learning dynamics compatible with crossbar arrays and the
temporal dynamics of SNNs. The derivation reveals that circuits used for
inference and training dynamics can be shared, which simplifies the circuit and
suppresses the effects of fabrication mismatch. We present SPICE simulations on
XFAB 180nm process, as well as large-scale simulations of the spiking neural
networks on event-based benchmarks, including a gesture recognition task. Our
results show that the number of updates can be reduced hundred-fold compared to
the standard rule while achieving performances that are on par with the
state-of-the-art
ALL-MASK: A Reconfigurable Logic Locking Method for Multicore Architecture with Sequential-Instruction-Oriented Key
Intellectual property (IP) piracy has become a non-negligible problem as the
integrated circuit (IC) production supply chain is becoming increasingly
globalized and separated that enables attacks by potentially untrusted
attackers. Logic locking is a widely adopted method to lock the circuit module
with a key and prevent hackers from cracking it. The key is the critical aspect
of logic locking, but the existing works have overlooked three possible
challenges of the key: safety of key storage, easy key-attempt from interface
and key-related overheads, bringing the further challenges of low error rate
and small state space. In this work, the key is dynamically generated by
utilizing the huge space of a CPU core, and the unlocking is performed
implicitly through the interconnection inside the chip. A novel low-cost logic
reconfigurable gate is together proposed with ferroelectric FET (FeFET) to
mitigate the reverse engineering and removal attack. Compared to the common
logic locking methods, our proposed approach is 19,945 times more time
consuming to traverse all the possible combinations in only 9-bit-key
condition. Furthermore, our technique let key length increases this complexity
exponentially and ensure the logic obfuscation effect.Comment: 15 pages, 17 figure
Romanus: Robust Task Offloading in Modular Multi-Sensor Autonomous Driving Systems
Due to the high performance and safety requirements of self-driving
applications, the complexity of modern autonomous driving systems (ADS) has
been growing, instigating the need for more sophisticated hardware which could
add to the energy footprint of the ADS platform. Addressing this, edge
computing is poised to encompass self-driving applications, enabling the
compute-intensive autonomy-related tasks to be offloaded for processing at
compute-capable edge servers. Nonetheless, the intricate hardware architecture
of ADS platforms, in addition to the stringent robustness demands, set forth
complications for task offloading which are unique to autonomous driving.
Hence, we present , a methodology for robust and efficient task
offloading for modular ADS platforms with multi-sensor processing pipelines.
Our methodology entails two phases: (i) the introduction of efficient
offloading points along the execution path of the involved deep learning
models, and (ii) the implementation of a runtime solution based on Deep
Reinforcement Learning to adapt the operating mode according to variations in
the perceived road scene complexity, network connectivity, and server load.
Experiments on the object detection use case demonstrated that our approach is
14.99% more energy-efficient than pure local execution while achieving a 77.06%
reduction in risky behavior from a robust-agnostic offloading baseline.Comment: This paper has been accepted to the 2022 International Conference On
Computer-Aided Design (ICCAD 2022
Creation and detection of hardware trojans using non-invasive off-the-shelf technologies
As a result of the globalisation of the semiconductor design and fabrication processes, integrated circuits are becoming increasingly vulnerable to malicious attacks. The most concerning threats are hardware trojans. A hardware trojan is a malicious inclusion or alteration to the existing design of an integrated circuit, with the possible effects ranging from leakage of sensitive information to the complete destruction of the integrated circuit itself. While the majority of existing detection schemes focus on test-time, they all require expensive methodologies to detect hardware trojans. Off-the-shelf approaches have often been overlooked due to limited hardware resources and detection accuracy. With the advances in technologies and the democratisation of open-source hardware, however, these tools enable the detection of hardware trojans at reduced costs during or after production. In this manuscript, a hardware trojan is created and emulated on a consumer FPGA board. The experiments to detect the trojan in a dormant and active state are made using off-the-shelf technologies taking advantage of different techniques such as Power Analysis Reports, Side Channel Analysis and Thermal Measurements. Furthermore, multiple attempts to detect the trojan are demonstrated and benchmarked. Our simulations result in a state-of-the-art methodology to accurately detect the trojan in both dormant and active states using off-the-shelf hardware
Circuit topology and synthesis flow co-design for the development of computational ReRAM
© 2022 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes,creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.Emerging memory technologies will play a decisive role in the quest for more energy-efficient computing systems. Computational ReRAM structures based on resistive switching devices (memristors) have been explored for in-memory computations using the resistance of ReRAM cells for storage and for logic I/O representation. Such approach presents three major challenges: the support for a memristor-oriented logic style, the ad-hoc design of memory array driving circuitry for memory and logic operations, and the development of dedicated synthesis tools to instruct the multi-level operations required for the execution of an arbitrary logic function in memory. This work contributes towards the development of an automated design flow for ReRAM-based computational memories, highlighting some important HW-SW co-design considerations. We briefly present a case study concerning a synthesis flow for a nonstateful logic style and the co-design of the underlying 1T1R crossbar array driving circuit. The prototype of the synthesis flow is based on the ABC tool and the Z3 solver. It executes fast owing to the level-by-level mapping of logic gates. Moreover, it delivers a mapping that minimizes the logic function latency through parallel logic operations, while also using the less possible ReRAM cells.Supported by Synopsys, Chile, by the Chilean grants FONDECYT
Regular 1221747 and ANID-Basal FB0008, and by the Spanish
MCIN/AEI/10.13039/501100011033 grant PID2019-103869RB-C33Peer ReviewedPostprint (author's final draft
Recommended from our members
Resource-Aware Predictive Models in Cyber-Physical Systems
Cyber-Physical Systems (CPS) are composed of computing devices interacting with physical systems. Model-based design is a powerful methodology in CPS design in the implementation of control systems. For instance, Model Predictive Control (MPC) is typically implemented in CPS applications, e.g., in path tracking of autonomous vehicles. MPC deploys a model to estimate the behavior of the physical system at future time instants for a specific time horizon. Ordinary Differential Equations (ODE) are the most commonly used models to emulate the behavior of continuous-time (non-)linear dynamical systems. A complex physical model may comprise thousands of ODEs that pose scalability, performance and power consumption challenges. One approach to address these model complexity challenges are frameworks that automate the development of model-to-model transformation. In this dissertation, a state-based model with tunable parameters is proposed to operate as a reconfigurable predictive model of the physical system. Moreover, we propose a run-time switching algorithm that selects the best model using machine learning. We employed a metric that formulates the trade-off between the error and computational savings due to model reduction. Building statistical models are constrained to having expert knowledge and an actual understanding of the modeled phenomenon or process. Also, statistical models may not produce solutions that are as robust in a real-world context as factors outside the model, like disruptions would not be taken into account. Machine learning models have emerged as a solution to account for the dynamic behavior of the environment and automate intelligence acquisition and refinement. Neural networks are machine learning models, well-known to have the ability to learn linear and nonlinear relations between input and output variables without prior knowledge. However, the ability to efficiently exploit resource-hungry neural networks in embedded resource-bound settings is a major challenge.Here, we proposed Priority Neuron Network (PNN), a resource-aware neural networks model that can be reconfigured into smaller sub-networks at runtime. This approach enables a trade-off between the model's computation time and accuracy based on available resources. The PNN model is memory efficient since it stores only one set of parameters to account for various sub-network sizes. We propose a training algorithm that applies regularization techniques to constrain the activation value of neurons and assigns a priority to each one. We consider the neuron's ordinal number as our priority criteria in that the priority of the neuron is inversely proportional to its ordinal number in the layer. This imposes a relatively sorted order on the activation values. We conduct experiments to employ our PNN as the predictive model in a CPS application. We can see that not only our technique will resolve the memory overhead of DNN architectures but it also reduces the computation overhead for the training process substantially. The training time is a critical matter especially in embedded systems where many NN models are trained on the fly
Adaptive Knobs for Resource Efficient Computing
Performance demands of emerging domains such as artificial intelligence, machine learning and vision, Internet-of-things etc., continue to grow. Meeting such requirements on modern multi/many core systems with higher power densities, fixed power and energy budgets, and thermal constraints exacerbates the run-time management challenge. This leaves an open problem on extracting the required performance within the power and energy limits, while also ensuring thermal safety. Existing architectural solutions including asymmetric and heterogeneous cores and custom acceleration improve performance-per-watt in specific design time and static scenarios. However, satisfying applications’ performance requirements under dynamic and unknown workload scenarios subject to varying system dynamics of power, temperature and energy requires intelligent run-time management.
Adaptive strategies are necessary for maximizing resource efficiency, considering i) diverse requirements and characteristics of concurrent applications, ii) dynamic workload variation, iii) core-level heterogeneity and iv) power, thermal and energy constraints. This dissertation proposes such adaptive techniques for efficient run-time resource management to maximize performance within fixed budgets under unknown and dynamic workload scenarios. Resource management strategies proposed in this dissertation comprehensively consider application and workload characteristics and variable effect of power actuation on performance for pro-active and appropriate allocation decisions. Specific contributions include i) run-time mapping approach to improve power budgets for higher throughput, ii) thermal aware performance boosting for efficient utilization of power budget and higher performance, iii) approximation as a run-time knob exploiting accuracy performance trade-offs for maximizing performance under power caps at minimal loss of accuracy and iv) co-ordinated approximation for heterogeneous systems
through joint actuation of dynamic approximation and power knobs for performance guarantees with minimal power consumption.
The approaches presented in this dissertation focus on adapting existing mapping techniques, performance boosting strategies, software and dynamic approximations to meet the performance requirements, simultaneously considering system constraints. The proposed strategies are compared against relevant state-of-the-art run-time management frameworks to qualitatively evaluate their efficacy
Understanding Timing Error Characteristics From Overclocked Systolic Multiply–Accumulate Arrays in FPGAs
Artificial Intelligence (AI) hardware accelerators have seen tremendous developments in recent years due to the rapid growth of AI in multiple fields. Many such accelerators comprise a Systolic Multiply–Accumulate Array (SMA) as its computational brain. In this paper, we investigate the faulty output characterization of an SMA in a real silicon FPGA board. Experiments were run on a single Zybo Z7-20 board to control for process variation at nominal voltage and in small batches to control for temperature. The FPGA is rated up to 800 MHz in the data sheet due to the max frequency of the PLL, but the design is written using Verilog for the FPGA and C++ for the processor and synthesized with a chosen constraint of a 125 MHz clock. We then operate the system at a frequency range of 125 MHz to 450 MHz for the FPGA and the nominal 667 MHz for the processor core to produce timing errors in the FPGA without affecting the processor. Our extensive experimental platform with a hardware–software ecosystem provides a methodological pathway that reveals fascinating characteristics of SMA behavior under an overclocked environment. While one may intuitively expect that timing errors resulting from overclocked hardware may produce a wide variation in output values, our post-silicon evaluation reveals a lack of variation in erroneous output values. We found an intriguing pattern where error output values are stable for a given input across a range of operating frequencies far exceeding the rated frequency of the FPGA
- …