19 research outputs found

    Towards Accurate Run-Time Hardware-Assisted Stealthy Malware Detection: A Lightweight, yet Effective Time Series CNN-Based Approach

    Get PDF
    According to recent security analysis reports, malicious software (a.k.a. malware) is rising at an alarming rate in numbers, complexity, and harmful purposes to compromise the security of modern computer systems. Recently, malware detection based on low-level hardware features (e.g., Hardware Performance Counters (HPCs) information) has emerged as an effective alternative solution to address the complexity and performance overheads of traditional software-based detection methods. Hardware-assisted Malware Detection (HMD) techniques depend on standard Machine Learning (ML) classifiers to detect signatures of malicious applications by monitoring built-in HPC registers during execution at run-time. Prior HMD methods though effective have limited their study on detecting malicious applications that are spawned as a separate thread during application execution, hence detecting stealthy malware patterns at run-time remains a critical challenge. Stealthy malware refers to harmful cyber attacks in which malicious code is hidden within benign applications and remains undetected by traditional malware detection approaches. In this paper, we first present a comprehensive review of recent advances in hardware-assisted malware detection studies that have used standard ML techniques to detect the malware signatures. Next, to address the challenge of stealthy malware detection at the processor’s hardware level, we propose StealthMiner, a novel specialized time series machine learning-based approach to accurately detect stealthy malware trace at run-time using branch instructions, the most prominent HPC feature. StealthMiner is based on a lightweight time series Fully Convolutional Neural Network (FCN) model that automatically identifies potentially contaminated samples in HPC-based time series data and utilizes them to accurately recognize the trace of stealthy malware. Our analysis demonstrates that using state-of-the-art ML-based malware detection methods is not effective in detecting stealthy malware samples since the captured HPC data not only represents malware but also carries benign applications’ microarchitectural data. The experimental results demonstrate that with the aid of our novel intelligent approach, stealthy malware can be detected at run-time with 94% detection performance on average with only one HPC feature, outperforming the detection performance of state-of-the-art HMD and general time series classification methods by up to 42% and 36%, respectively

    Efficient communication protection of many-core systems against active attackers

    Get PDF
    Many-core system-on-chips, together with their established communication infrastructures, Networks-on-Chip (NoC), are growing in complexity, which encourages the integration of third-party components to simplify and accelerate production processes. However, this also adversely exposes the surface for attacks through the injection of hardware Trojans. This work addresses active attacks on NoCs and focuses on the integrity and availability of transmitted data. In particular, we consider the modification and/or dropping of data during transmission as active attacks that might be performed by malicious routers. To mitigate the impact of such active attacks, we propose two lightweight solutions that respect the performance constraints of NoCs. Assuming the presence of symmetric keys, these approaches combine lightweight authentication codes for integrity protection with network coding for increased efficiency and robustness. The proposed solutions prevent undetected modifications and significantly increase availability through a reliable detection of attacks. The efficiency of these solutions is investigated in different scenarios using cycle-accurate simulations and the area overhead is analyzed relative to state-of-the-art many-core system. The results demonstrate that one authentication scheme with network coding protects the integrity of data to a low residual error of 1.36% at 0.2 attack probability with an area overhead of 2.68%. For faster and more flexible evaluation, an analytical approach is developed which is validated against the cycle-accurate simulations. The analytical approach is more than 1000× faster while having a maximum estimation error of 5%. Moreover, the analytical model provides a deeper insight into the system’s behavior. For example, it reveals which factors influence the performance parameters

    Multi-Tenant Cloud FPGA: A Survey on Security

    Full text link
    With the exponentially increasing demand for performance and scalability in cloud applications and systems, data center architectures evolved to integrate heterogeneous computing fabrics that leverage CPUs, GPUs, and FPGAs. FPGAs differ from traditional processing platforms such as CPUs and GPUs in that they are reconfigurable at run-time, providing increased and customized performance, flexibility, and acceleration. FPGAs can perform large-scale search optimization, acceleration, and signal processing tasks compared with power, latency, and processing speed. Many public cloud provider giants, including Amazon, Huawei, Microsoft, Alibaba, etc., have already started integrating FPGA-based cloud acceleration services. While FPGAs in cloud applications enable customized acceleration with low power consumption, it also incurs new security challenges that still need to be reviewed. Allowing cloud users to reconfigure the hardware design after deployment could open the backdoors for malicious attackers, potentially putting the cloud platform at risk. Considering security risks, public cloud providers still don't offer multi-tenant FPGA services. This paper analyzes the security concerns of multi-tenant cloud FPGAs, gives a thorough description of the security problems associated with them, and discusses upcoming future challenges in this field of study

    Defeating NewHope with a Single Trace

    Get PDF
    The key encapsulation method NewHope allows two parties to agree on a secret key. The scheme includes a private and a public key. While the public key is used to encipher a random shared secret, the private key enables to decipher the ciphertext. NewHope is a candidate in the NIST post-quantum project, whose aim is to standardize cryptographic systems that are secure against attacks originating from both quantum and classical computers. While NewHope relies on the theory of quantum-resistant lattice problems, practical implementations have shown vulnerabilities against side-channel attacks targeting the extraction of the private key. In this paper, we demonstrate a new attack on the shared secret. The target consists of the C reference implementation as submitted to the NIST contest, being executed on a Cortex-M4 processor. Based on power measurement, the complete shared secret can be extracted from data of one single trace only. Further, we analyze the impact of different compiler directives. When the code is compiled with optimization turned off, the shared secret can be read from an oscilloscope display directly with the naked eye. When optimizations are enabled, the attack requires some more sophisticated techniques, but the attack still works on single power traces

    High-Speed Hardware Architectures and FPGA Benchmarking of CRYSTALS-Kyber, NTRU, and Saber

    Get PDF
    Performance in hardware has typically played a significant role in differentiating among leading candidates in cryptographic standardization efforts. Winners of two past NIST cryptographic contests (Rijndael in case of AES and Keccak in case of SHA-3) were ranked consistently among the two fastest candidates when implemented using FPGAs and ASICs. Hardware implementations of cryptographic operations may quite easily outperform software implementations for at least a subset of major performance metrics, such as latency, number of operations per second, power consumption, and energy usage, as well as in terms of security against physical attacks, including side-channel analysis. Using hardware also permits much higher flexibility in trading one subset of these properties for another. This paper presents high-speed hardware architectures for four lattice-based CCA-secure Key Encapsulation Mechanisms (KEMs), representing three NIST PQC finalists: CRYSTALS-Kyber, NTRU (with two distinct variants, NTRU-HPS and NTRU-HRSS), and Saber. We rank these candidates among each other and compare them with all other Round 3 KEMs based on the data from the previously reported work

    Explorando Árvores de Decisão Em Um Fluxo de Síntese Para Circuitos Aproximados

    Get PDF
    TCC (graduação) - Universidade Federal de Santa Catarina, Centro Tecnológico, Ciências da Computação.A etapa de síntese lógica no desenvolvimento de circuitos integrados tem se tornado mais desafiadora nos últimos anos devido ao crescimento na complexidade dos circuitos projetados. A escala nanométrica dos atuais transistores permite maior integração em um mesmo chip, implicando em funções mais complexas, com mais entradas e mais termos a serem otimizados. Muitos dos métodos de otimização lógica tradicional atingem seu limite de otimização em poucas dezenas de entradas, ou conseguem minimizar funções complexas ao custo de grandes tempos de execução. Algoritmos de Aprendizado de Máquina vêm se tornando mais comuns em diversas áreas da tecnologia, incluindo a área de ferramentas para o projeto de circuitos integrados, conhecida como Electronic Design Automation. Explorar essas ferramentas visando otimizar o tempo de execução da síntese lógica e analisar seu comportamento nos resultados de área e potência são os objetivos deste trabalho. Desta forma, este trabalho propõe um fluxo de síntese partindo de uma Tabela Verdade do circuito como entrada e apresentando soluções de síntese voltadas a baixo consumo energético, adotando, em alguns casos, Computação Aproximada. A otimização lógica desenvolvida é baseada em Árvores de Decisão, permitindo que a minimização lógica produza saídas exatas, assim como também possibilitando que alguma incerteza seja inserida no sistema através de restrições na profundidade da Árvore, por exemplo. A exploração de aproximação nas soluções minimizadas pode levar a circuitos com melhor eficiência energética mantendo níveis aceitáveis de precisão para aplicações tolerantes a erro. O fluxo de síntese proposto permite a comparação entre a síntese utilizando ferramentas tradicionais com aquela obtida pelo método de Árvore de Decisão. A saída da minimização lógica é direcionada para o fluxo OpenROAD. O OpenROAD é um projeto de Electronic Design Automation que utiliza diversas ferramentas open-source integradas e permite uma síntese standard cell mapeada para uma tecnologia ASIC. Os resultados observados mostram o quanto abordagens de Computação Aproximada podem ser promissoras, tendo reduzido a média de área, atraso e potência estudados para boa parte dos casos avaliados. Essa média teve um aumento quando se aplica Árvore de Decisão definindo uma acurácia de 100% quando comparado com o fluxo OpenROAD sem Árvore de Decisão, mas reduziu continuamente com a diminuição da precisão do circuito escolhida. Com uma acurácia de 90%, por exemplo, a média de área e potência do conjunto estudado diminuiu 41,49% e 47,19%, respectivamente, em comparação com os resultados obtidos com o OpenROAD.The logic synthesis step in the development of integrated circuits has become more chal- lenging in recent years due to the growth in the complexity of designed circuits. The nanometric scale of current transistors allows greater integration on the same chip, imply- ing more complex functions, with more inputs and more terms to be optimized. Many of the traditional logic optimization methods reach their optimization limit in a few dozen inputs, or manage to minimize complex functions at the cost of long execution times. Machine Learning Algorithms are becoming more common in several areas of technology, including the area of tools for the design of integrated circuits, known as Electronic Design Automation. Exploring these tools in order to optimize the execution time of the logic synthesis and analyze their behavior in the area and power results are the objectives of this work. Thus, this work proposes a synthesis flow starting from a circuits’ Truth Table as input and presenting synthesis solutions aiming low energy consumption, adopting, in some cases, Approximate Computing. The developed logic optimization is based on Decision Trees, allowing the logical minimization to produce exact outputs, as well as allowing some uncertainty to be inserted in the system through restrictions in the depth of the tree, for example. The approximation in the minimized solutions can lead to circuits with better energy efficiency maintaining acceptable levels of accuracy for error tolerant applications. The proposed synthesis flow allows the comparison between the synthesis using traditional tools and the one obtained by the Decision Tree method. The output of logical minimization is directed to the OpenROAD flow. OpenROAD is an Electronic Design Automation project that uses several integrated open-source tools and allows a standard cell synthesis mapped to an ASIC technology. The observed results show how promising Approximate Computing approaches can be, having reduced the average of the area, delay and power experienced for most of the evaluated cases. This average had an increase when Decision Tree is applied defining an accuracy of 100% when compared to the OpenROAD flow without Decision Tree, but it continuously decreased with the chosen circuits’ precision reduction. With an accuracy of 90%, for example, the average area and power of the studied set decreased by 41.49% and 47.19%, respectively, compared to the obtained OpenROAD results

    MOCAST 2021

    Get PDF
    The 10th International Conference on Modern Circuit and System Technologies on Electronics and Communications (MOCAST 2021) will take place in Thessaloniki, Greece, from July 5th to July 7th, 2021. The MOCAST technical program includes all aspects of circuit and system technologies, from modeling to design, verification, implementation, and application. This Special Issue presents extended versions of top-ranking papers in the conference. The topics of MOCAST include:Analog/RF and mixed signal circuits;Digital circuits and systems design;Nonlinear circuits and systems;Device and circuit modeling;High-performance embedded systems;Systems and applications;Sensors and systems;Machine learning and AI applications;Communication; Network systems;Power management;Imagers, MEMS, medical, and displays;Radiation front ends (nuclear and space application);Education in circuits, systems, and communications

    Energy and Area Efficient Machine Learning Architectures using Spin-Based Neurons

    Get PDF
    Recently, spintronic devices with low energy barrier nanomagnets such as spin orbit torque-Magnetic Tunnel Junctions (SOT-MTJs) and embedded magnetoresistive random access memory (MRAM) devices are being leveraged as a natural building block to provide probabilistic sigmoidal activation functions for RBMs. In this dissertation research, we use the Probabilistic Inference Network Simulator (PIN-Sim) to realize a circuit-level implementation of deep belief networks (DBNs) using memristive crossbars as weighted connections and embedded MRAM-based neurons as activation functions. Herein, a probabilistic interpolation recoder (PIR) circuit is developed for DBNs with probabilistic spin logic (p-bit)-based neurons to interpolate the probabilistic output of the neurons in the last hidden layer which are representing different output classes. Moreover, the impact of reducing the Magnetic Tunnel Junction\u27s (MTJ\u27s) energy barrier is assessed and optimized for the resulting stochasticity present in the learning system. In p-bit based DBNs, different defects such as variation of the nanomagnet thickness can undermine functionality by decreasing the fluctuation speed of the p-bit realized using a nanomagnet. A method is developed and refined to control the fluctuation frequency of the output of a p-bit device by employing a feedback mechanism. The feedback can alleviate this process variation sensitivity of p-bit based DBNs. This compact and low complexity method which is presented by introducing the self-compensating circuit can alleviate the influences of process variation in fabrication and practical implementation. Furthermore, this research presents an innovative image recognition technique for MNIST dataset on the basis of p-bit-based DBNs and TSK rule-based fuzzy systems. The proposed DBN-fuzzy system is introduced to benefit from low energy and area consumption of p-bit-based DBNs and high accuracy of TSK rule-based fuzzy systems. This system initially recognizes the top results through the p-bit-based DBN and then, the fuzzy system is employed to attain the top-1 recognition results from the obtained top outputs. Simulation results exhibit that a DBN-Fuzzy neural network not only has lower energy and area consumption than bigger DBN topologies while also achieving higher accuracy

    Appropriateness of Imperfect CNFET Based Circuits for Error Resilient Computing Systems

    Get PDF
    With superior device performance consistently reported in extremely scaled dimensions, low dimensional materials (LDMs), including Carbon Nanotube Field Effect Transistor (CNFET) based technology, have shown the potential to outperform silicon for future transistors in advanced technology nodes. Studies have also demonstrated orders of magnitude improvement in energy efficiency possible with LDMs, in comparison to silicon at competing technology nodes. However, the current fabrication processes for these materials suffer from process imperfections and still appear to be inadequate to compete with silicon for the mainstream high volume manufacturing. Among the LDMs, CNFETs are the most widely studied and closest to high volume manufacturing. Recent works have shown a significant increase in the complexity of CNFET based systems, including demonstration of a 16-bit microprocessor. However, the design of such systems has involved significantly wider-than-usual transistors and avoidance of certain logic combinations. The resulting complexity of several thousand transistors in such systems is still far from the requirements of high-performance general-purpose computing systems having billions of transistors. With the current progress of the process to fabricate CNFETs, their introduction in mainstream manufacturing is expected to take several more years. For an earlier technology adoption, CNFETs appear to be suited for error-resilient computing systems where errors during computation can be tolerated to a certain degree. Such systems relax the need for precise circuits and a perfect process while leveraging the potential energy benefits of CNFET technology in comparison to conventional Si technology. In this thesis, we explore the potential applications using an imperfect CNFET process for error-resilient computing systems, including the impact of the process imperfections at the system level and methods to improve it. The current most widely adopted fabrication process for CNFETs (separation and placement of solution-based CNTs) still suffers from process imperfections, mainly from open CNTs due to missing of CNTs (in trenches connecting source and drain of CNFET). A fair evaluation of the performance of CNFET based circuits should thus take into consideration the effect of open CNTs, resulting in reduced drive currents. At the circuit level, this leads to failures in meeting 1) the minimum frequency requirement (due to an increase in critical path delay), and 2) the noise suppression requirement. We present a methodology to accurately capture the effect of open CNT imperfection in the state-of-the-art CNFET model, for circuit-level performance evaluation (both delay and glitch vulnerability) of CNFET based circuits using SPICE. A Monte Carlo simulation framework is also provided to investigate the statistical effect of open CNT imperfection on circuit-level performance. We introduce essential metrics to evaluate glitch vulnerability and also provide an effective link between glitch vulnerability and circuit topology. The past few years have observed significant growth of interest in approximate computing for a wide range of applications, including signal processing, data mining, machine learning, image, video processing, etc. In such applications, the result quality is not compromised appreciably, even in the presence of few errors during computation. The ability to tolerate few errors during computation relaxes the need to have precise circuits. Thus the approximate circuits can be designed, with lesser nodes, reduced stages, and reduced capacitance at few nodes. Consequently, the approximate circuits could reduce critical path delays and enhanced noise suppression in comparison to precise circuits. We present a systematic methodology utilizing Reduced Ordered Binary Decision Diagrams (ROBDD) for generating approximate circuits by taking an example of 16-bit parallel prefix CNFET adder. The approximate adder generated using the proposed algorithm has ~ 5x reduction in the average number of nodes failing glitch criteria (along paths to primary output) and 43.4% lesser Energy Delay Product (EDP) even at high open CNT imperfection, in comparison to the ideal case of no open CNT imperfection, at a mean relative error of 3.3%. The recent boom of deep learning has been made possible by VLSI technology advancement resulting in hardware systems, which can support deep learning algorithms. These hardware systems intend to satisfy the high-energy efficiency requirement of such algorithms. The hardware supporting such algorithms adopts neuromorphic-computing architectures with significantly less energy compared to traditional Von Neumann architectures. Deep Neural Networks (DNNs) belonging to deep learning domain find its use in a wide range of applications such as image classification, speech recognition, etc. Recent hardware systems have demonstrated the implementation of complex neural networks at significantly less power. However, the complexity of applications and depths of DNNs are expected to drastically increase in the future, imposing a demanding requirement in terms of scalability and energy efficiency of hardware technology. CNFET technology can be an excellent alternative to meet the aggressive energy efficiency requirement for future DNNs. However, degradation in circuit-level performance due to open CNT imperfection can result in timing failure, thus distorting the shape of non-linear activation function, leading to a significant degradation in classification accuracy. We present a framework to obtain sigmoid activation function considering the effect of open CNT imperfection. A digital neuron is explored to generate the sigmoid activation function, which deviates from the ideal case under imperfect process and reduced time period (increased clock frequency). The inherent error resilience of DNNs, on the other hand, can be utilized to mitigate the impact of imperfect process and maintain the shape of the activation function. We use pruning of synaptic weights, which, combined with the proposed approximate neuron, significantly reduces the chance of timing failures and helps to maintain the activation function shape even at high process imperfection and higher clock frequencies. We also provide a framework to obtain classification accuracy of Deep Belief Networks (class of DNNs based on unsupervised learning) using the activation functions obtained from SPICE simulations. By using both approximate neurons and pruning of synaptic weights, we achieve excellent system accuracy (only < 0.5% accuracy drop) with 25% improvement in speed, significant EDP advantage (56.7% less) even at high process imperfection, in comparison to a base configuration of the precise neuron and no pruning with the ideal process, at no area penalty. In conclusion, this thesis provides directions for the potential applicability of CNFET based technology for error-resilient computing systems. For this purpose, we present methodologies, which provide approaches to assess and design CNFET based circuits, considering process imperfections. We accomplish a DBN framework for digit recognition, considering activation functions from SPICE simulations incorporating process imperfections. We demonstrate the effectiveness of using approximate neuron and synaptic weight pruning to mitigate the impact of high process imperfection on system accuracy
    corecore