199 research outputs found

    SCALING UP TASK EXECUTION ON RESOURCE-CONSTRAINED SYSTEMS

    Get PDF
    The ubiquity of executing machine learning tasks on embedded systems with constrained resources has made efficient execution of neural networks on these systems under the CPU, memory, and energy constraints increasingly important. Different from high-end computing systems where resources are abundant and reliable, resource-constrained systems only have limited computational capability, limited memory, and limited energy supply. This dissertation focuses on how to take full advantage of the limited resources of these systems in order to improve task execution efficiency from different aspects of the execution pipeline. While the existing literature primarily aims at solving the problem by shrinking the model size according to the resource constraints, this dissertation aims to improve the execution efficiency for a given set of tasks from the following two aspects. Firstly, we propose SmartON, which is the first batteryless active event detection system that considers both the event arrival pattern as well as the harvested energy to determine when the system should wake up and what the duty cycle should be. Secondly, we propose Antler, which exploits the affinity between all pairs of tasks in a multitask inference system to construct a compact graph representation of the task set for a given overall size budget. To achieve the aforementioned algorithmic proposals, we propose the following hardware solutions. One is a controllable capacitor array that can expand the system’s energy storage on-the-fly. The other is a FRAM array that can accommodate multiple neural networks running on one system.Doctor of Philosoph

    Adaptive vehicular networking with Deep Learning

    Get PDF
    Vehicular networks have been identified as a key enabler for future smart traffic applications aiming to improve on-road safety, increase road traffic efficiency, or provide advanced infotainment services to improve on-board comfort. However, the requirements of smart traffic applications also place demands on vehicular networks’ quality in terms of high data rates, low latency, and reliability, while simultaneously meeting the challenges of sustainability, green network development goals and energy efficiency. The advances in vehicular communication technologies combined with the peculiar characteristics of vehicular networks have brought challenges to traditional networking solutions designed around fixed parameters using complex mathematical optimisation. These challenges necessitate greater intelligence to be embedded in vehicular networks to realise adaptive network optimisation. As such, one promising solution is the use of Machine Learning (ML) algorithms to extract hidden patterns from collected data thus formulating adaptive network optimisation solutions with strong generalisation capabilities. In this thesis, an overview of the underlying technologies, applications, and characteristics of vehicular networks is presented, followed by the motivation of using ML and a general introduction of ML background. Additionally, a literature review of ML applications in vehicular networks is also presented drawing on the state-of-the-art of ML technology adoption. Three key challenging research topics have been identified centred around network optimisation and ML deployment aspects. The first research question and contribution focus on mobile Handover (HO) optimisation as vehicles pass between base stations; a Deep Reinforcement Learning (DRL) handover algorithm is proposed and evaluated against the currently deployed method. Simulation results suggest that the proposed algorithm can guarantee optimal HO decision in a realistic simulation setup. The second contribution explores distributed radio resource management optimisation. Two versions of a Federated Learning (FL) enhanced DRL algorithm are proposed and evaluated against other state-of-the-art ML solutions. Simulation results suggest that the proposed solution outperformed other benchmarks in overall resource utilisation efficiency, especially in generalisation scenarios. The third contribution looks at energy efficiency optimisation on the network side considering a backdrop of sustainability and green networking. A cell switching algorithm was developed based on a Graph Neural Network (GNN) model and the proposed energy efficiency scheme is able to achieve almost 95% of the metric normalised energy efficiency compared against the “ideal” optimal energy efficiency benchmark and is capable of being applied in many more general network configurations compared with the state-of-the-art ML benchmark

    FLIPS: Federated Learning using Intelligent Participant Selection

    Full text link
    This paper presents the design and implementation of FLIPS, a middleware system to manage data and participant heterogeneity in federated learning (FL) training workloads. In particular, we examine the benefits of label distribution clustering on participant selection in federated learning. FLIPS clusters parties involved in an FL training job based on the label distribution of their data apriori, and during FL training, ensures that each cluster is equitably represented in the participants selected. FLIPS can support the most common FL algorithms, including FedAvg, FedProx, FedDyn, FedOpt and FedYogi. To manage platform heterogeneity and dynamic resource availability, FLIPS incorporates a straggler management mechanism to handle changing capacities in distributed, smart community applications. Privacy of label distributions, clustering and participant selection is ensured through a trusted execution environment (TEE). Our comprehensive empirical evaluation compares FLIPS with random participant selection, as well as two other "smart" selection mechanisms - Oort and gradient clustering using two real-world datasets, two different non-IID distributions and three common FL algorithms (FedYogi, FedProx and FedAvg). We demonstrate that FLIPS significantly improves convergence, achieving higher accuracy by 17 - 20 % with 20 - 60 % lower communication costs, and these benefits endure in the presence of straggler participants

    Resilient and Scalable Forwarding for Software-Defined Networks with P4-Programmable Switches

    Get PDF
    Traditional networking devices support only fixed features and limited configurability. Network softwarization leverages programmable software and hardware platforms to remove those limitations. In this context the concept of programmable data planes allows directly to program the packet processing pipeline of networking devices and create custom control plane algorithms. This flexibility enables the design of novel networking mechanisms where the status quo struggles to meet high demands of next-generation networks like 5G, Internet of Things, cloud computing, and industry 4.0. P4 is the most popular technology to implement programmable data planes. However, programmable data planes, and in particular, the P4 technology, emerged only recently. Thus, P4 support for some well-established networking concepts is still lacking and several issues remain unsolved due to the different characteristics of programmable data planes in comparison to traditional networking. The research of this thesis focuses on two open issues of programmable data planes. First, it develops resilient and efficient forwarding mechanisms for the P4 data plane as there are no satisfying state of the art best practices yet. Second, it enables BIER in high-performance P4 data planes. BIER is a novel, scalable, and efficient transport mechanism for IP multicast traffic which has only very limited support of high-performance forwarding platforms yet. The main results of this thesis are published as 8 peer-reviewed and one post-publication peer-reviewed publication. The results cover the development of suitable resilience mechanisms for P4 data planes, the development and implementation of resilient BIER forwarding in P4, and the extensive evaluations of all developed and implemented mechanisms. Furthermore, the results contain a comprehensive P4 literature study. Two more peer-reviewed papers contain additional content that is not directly related to the main results. They implement congestion avoidance mechanisms in P4 and develop a scheduling concept to find cost-optimized load schedules based on day-ahead forecasts

    Collaborative Honeypot Defense in UAV Networks: A Learning-Based Game Approach

    Full text link
    The proliferation of unmanned aerial vehicles (UAVs) opens up new opportunities for on-demand service provisioning anywhere and anytime, but also exposes UAVs to a variety of cyber threats. Low/medium interaction honeypots offer a promising lightweight defense for actively protecting mobile Internet of things, particularly UAV networks. While previous research has primarily focused on honeypot system design and attack pattern recognition, the incentive issue for motivating UAV's participation (e.g., sharing trapped attack data in honeypots) to collaboratively resist distributed and sophisticated attacks remains unexplored. This paper proposes a novel game-theoretical collaborative defense approach to address optimal, fair, and feasible incentive design, in the presence of network dynamics and UAVs' multi-dimensional private information (e.g., valid defense data (VDD) volume, communication delay, and UAV cost). Specifically, we first develop a honeypot game between UAVs and the network operator under both partial and complete information asymmetry scenarios. The optimal VDD-reward contract design problem with partial information asymmetry is then solved using a contract-theoretic approach that ensures budget feasibility, truthfulness, fairness, and computational efficiency. In addition, under complete information asymmetry, we devise a distributed reinforcement learning algorithm to dynamically design optimal contracts for distinct types of UAVs in the time-varying UAV network. Extensive simulations demonstrate that the proposed scheme can motivate UAV's cooperation in VDD sharing and improve defensive effectiveness, compared with conventional schemes.Comment: Accepted Aug. 28, 2023 by IEEE Transactions on Information Forensics & Security. arXiv admin note: text overlap with arXiv:2209.1381

    On designing hardware accelerator-based systems: interfaces, taxes and benefits

    Full text link
    Complementary Metal Oxide Semiconductor (CMOS) Technology scaling has slowed down. One promising approach to sustain the historic performance improvement of computing systems is to utilize hardware accelerators. Today, many commercial computing systems integrate one or more accelerators, with each accelerator optimized to efficiently execute specific tasks. Over the years, there has been a substantial amount of research on designing hardware accelerators for machine learning (ML) training and inference tasks. Hardware accelerators are also widely employed to accelerate data privacy and security algorithms. In particular, there is currently a growing interest in the use of hardware accelerators for accelerating homomorphic encryption (HE) based privacy-preserving computing. While the use of hardware accelerators is promising, a realistic end-to-end evaluation of an accelerator when integrated into the full system often reveals that the benefits of an accelerator are not always as expected. Simply assessing the performance of the accelerated portion of an application, such as the inference kernel in ML applications, during performance analysis can be misleading. When designing an accelerator-based system, it is critical to evaluate the system as a whole and account for all the accelerator taxes. In the first part of our research, we highlight the need for a holistic, end-to-end analysis of workloads using ML and HE applications. Our evaluation of an ML application for a database management system (DBMS) shows that the benefits of offloading ML inference to accelerators depend on several factors, including backend hardware, model complexity, data size, and the level of integration between the ML inference pipeline and the DBMS. We also found that the end-to-end performance improvement is bottlenecked by data retrieval and pre-processing, as well as inference. Additionally, our evaluation of an HE video encryption application shows that while HE client-side operations, i.e., message-to- ciphertext and ciphertext-to-message conversion operations, are bottlenecked by number theoretic transform (NTT) operations, accelerating NTT in hardware alone is not sufficient to get enough application throughput (frame rate per second) improvement. We need to address all bottlenecks such as error sampling, encryption, and decryption in message-to-ciphertext and ciphertext-to-message conversion pipeline. In the second part of our research, we address the lack of a scalable evaluation infrastructure for building and evaluating accelerator-based systems. To solve this problem, we propose a robust and scalable software-hardware framework for accelerator evaluation, which uses an open-source RISC-V based System-on-Chip (SoC) design called BlackParrot. This framework can be utilized by accelerator designers and system architects to perform an end-to-end performance analysis of coherent and non-coherent accelerators while carefully accounting for the interaction between the accelerator and the rest of the system. In the third part of our research, we present RISE, which is a full RISC-V SoC designed to efficiently perform message-to-ciphertext and ciphertext-to-message conversion operations. RISE comprises of a BlackParrot core and an efficient custom-designed accelerator tailored to accelerate end-to-end message-to-ciphertext and ciphertext-to-message conversion operations. Our RTL-based evaluation demonstrates that RISE improves the throughput of the video encryption application by 10x-27x for different frame resolutions

    Flexible Hardware-based Security-aware Mechanisms and Architectures

    Get PDF
    For decades, software security has been the primary focus in securing our computing platforms. Hardware was always assumed trusted, and inherently served as the foundation, and thus the root of trust, of our systems. This has been further leveraged in developing hardware-based dedicated security extensions and architectures to protect software from attacks exploiting software vulnerabilities such as memory corruption. However, the recent outbreak of microarchitectural attacks has shaken these long-established trust assumptions in hardware entirely, thereby threatening the security of all of our computing platforms and bringing hardware and microarchitectural security under scrutiny. These attacks have undeniably revealed the grave consequences of hardware/microarchitecture security flaws to the entire platform security, and how they can even subvert the security guarantees promised by dedicated security architectures. Furthermore, they shed light on the sophisticated challenges particular to hardware/microarchitectural security; it is more critical (and more challenging) to extensively analyze the hardware for security flaws prior to production, since hardware, unlike software, cannot be patched/updated once fabricated. Hardware cannot reliably serve as the root of trust anymore, unless we develop and adopt new design paradigms where security is proactively addressed and scrutinized across the full stack of our computing platforms, at all hardware design and implementation layers. Furthermore, novel flexible security-aware design mechanisms are required to be incorporated in processor microarchitecture and hardware-assisted security architectures, that can practically address the inherent conflict between performance and security by allowing that the trade-off is configured to adapt to the desired requirements. In this thesis, we investigate the prospects and implications at the intersection of hardware and security that emerge across the full stack of our computing platforms and System-on-Chips (SoCs). On one front, we investigate how we can leverage hardware and its advantages, in contrast to software, to build more efficient and effective security extensions that serve security architectures, e.g., by providing execution attestation and enforcement, to protect the software from attacks exploiting software vulnerabilities. We further propose that they are microarchitecturally configured at runtime to provide different types of security services, thus adapting flexibly to different deployment requirements. On another front, we investigate how we can protect these hardware-assisted security architectures and extensions themselves from microarchitectural and software attacks that exploit design flaws that originate in the hardware, e.g., insecure resource sharing in SoCs. More particularly, we focus in this thesis on cache-based side-channel attacks, where we propose sophisticated cache designs, that fundamentally mitigate these attacks, while still preserving performance by enabling that the performance security trade-off is configured by design. We also investigate how these can be incorporated into flexible and customizable security architectures, thus complementing them to further support a wide spectrum of emerging applications with different performance/security requirements. Lastly, we inspect our computing platforms further beneath the design layer, by scrutinizing how the actual implementation of these mechanisms is yet another potential attack surface. We explore how the security of hardware designs and implementations is currently analyzed prior to fabrication, while shedding light on how state-of-the-art hardware security analysis techniques are fundamentally limited, and the potential for improved and scalable approaches

    Full Stack Optimization of Transformer Inference: a Survey

    Full text link
    Recent advances in state-of-the-art DNN architecture design have been moving toward Transformer models. These models achieve superior accuracy across a wide range of applications. This trend has been consistent over the past several years since Transformer models were originally introduced. However, the amount of compute and bandwidth required for inference of recent Transformer models is growing at a significant rate, and this has made their deployment in latency-sensitive applications challenging. As such, there has been an increased focus on making Transformer models more efficient, with methods that range from changing the architecture design, all the way to developing dedicated domain-specific accelerators. In this work, we survey different approaches for efficient Transformer inference, including: (i) analysis and profiling of the bottlenecks in existing Transformer architectures and their similarities and differences with previous convolutional models; (ii) implications of Transformer architecture on hardware, including the impact of non-linear operations such as Layer Normalization, Softmax, and GELU, as well as linear operations, on hardware design; (iii) approaches for optimizing a fixed Transformer architecture; (iv) challenges in finding the right mapping and scheduling of operations for Transformer models; and (v) approaches for optimizing Transformer models by adapting the architecture using neural architecture search. Finally, we perform a case study by applying the surveyed optimizations on Gemmini, the open-source, full-stack DNN accelerator generator, and we show how each of these approaches can yield improvements, compared to previous benchmark results on Gemmini. Among other things, we find that a full-stack co-design approach with the aforementioned methods can result in up to 88.7x speedup with a minimal performance degradation for Transformer inference

    Jornadas Nacionales de Investigación en Ciberseguridad: actas de las VIII Jornadas Nacionales de Investigación en ciberseguridad: Vigo, 21 a 23 de junio de 2023

    Get PDF
    Jornadas Nacionales de Investigación en Ciberseguridad (8ª. 2023. Vigo)atlanTTicAMTEGA: Axencia para a modernización tecnolóxica de GaliciaINCIBE: Instituto Nacional de Cibersegurida

    DRAM Bender: An Extensible and Versatile FPGA-based Infrastructure to Easily Test State-of-the-art DRAM Chips

    Full text link
    To understand and improve DRAM performance, reliability, security and energy efficiency, prior works study characteristics of commodity DRAM chips. Unfortunately, state-of-the-art open source infrastructures capable of conducting such studies are obsolete, poorly supported, or difficult to use, or their inflexibility limit the types of studies they can conduct. We propose DRAM Bender, a new FPGA-based infrastructure that enables experimental studies on state-of-the-art DRAM chips. DRAM Bender offers three key features at the same time. First, DRAM Bender enables directly interfacing with a DRAM chip through its low-level interface. This allows users to issue DRAM commands in arbitrary order and with finer-grained time intervals compared to other open source infrastructures. Second, DRAM Bender exposes easy-to-use C++ and Python programming interfaces, allowing users to quickly and easily develop different types of DRAM experiments. Third, DRAM Bender is easily extensible. The modular design of DRAM Bender allows extending it to (i) support existing and emerging DRAM interfaces, and (ii) run on new commercial or custom FPGA boards with little effort. To demonstrate that DRAM Bender is a versatile infrastructure, we conduct three case studies, two of which lead to new observations about the DRAM RowHammer vulnerability. In particular, we show that data patterns supported by DRAM Bender uncovers a larger set of bit-flips on a victim row compared to the data patterns commonly used by prior work. We demonstrate the extensibility of DRAM Bender by implementing it on five different FPGAs with DDR4 and DDR3 support. DRAM Bender is freely and openly available at https://github.com/CMU-SAFARI/DRAM-Bender.Comment: To appear in TCAD 202
    corecore