50 research outputs found

    Approximate computing design exploration through data lifetime metrics

    Get PDF
    When designing an approximate computing system, the selection of the resources to modify is key. It is important that the error introduced in the system remains reasonable, but the size of the design exploration space can make this extremely difficult. In this paper, we propose to exploit a new metric for this selection: data lifetime. The concept comes from the field of reliability, where it can guide selective hardening: the more often a resource handles "live" data, the more critical it be-comes, the more important it will be to protect it. In this paper, we propose to use this same metric in a new way: identify the less critical resources as approximation targets in order to minimize the impact on the global system behavior and there-fore decrease the impact of approximation while increasing gains on other criteria

    CHERI Macaroons: Efficient, host-based access control for cyber-physical systems

    Get PDF
    Cyber-Physical Systems (CPS) often rely on network boundary defence as a primary means of access control; therefore, the compromise of one device threatens the security of all devices within the boundary. Resource and real-time constraints, tight hardware/software coupling, and decades-long service lifetimes complicate efforts for more robust, host-based access control mechanisms. Distributed capability systems provide opportunities for restoring access control to resource-owning devices; however, such a protection model requires a capability-based architecture for CPS devices as well as task compartmentalisation to be effective. This paper demonstrates hardware enforcement of network bearer tokens using an efficient translation between CHERI (Capability Hardware Enhanced RISC Instructions) architectural capabilities and Macaroon network tokens. While this method appears to generalise to any network-based access control problem, we specifically consider CPS, as our method is well-suited for controlling resources in the physical domain. We demonstrate the method in a distributed robotics application and in a hierarchical industrial control application, and discuss our plans to evaluate and extend the method.Gates Cambridge Scholarshi

    On the Reliability Assessment of Artificial Neural Networks Running on AI-Oriented MPSoCs

    Get PDF
    Nowadays, the usage of electronic devices running artificial neural networks (ANNs)-based applications is spreading in our everyday life. Due to their outstanding computational capabilities, ANNs have become appealing solutions for safety-critical systems as well. Frequently, they are considered intrinsically robust and fault tolerant for being brain-inspired and redundant computing models. However, when ANNs are deployed on resource-constrained hardware devices, single physical faults may compromise the activity of multiple neurons. Therefore, it is crucial to assess the reliability of the entire neural computing system, including both the software and the hardware components. This article systematically addresses reliability concerns for ANNs running on multiprocessor system-on-a-chips (MPSoCs). It presents a methodology to assign resilience scores to individual neurons and, based on that, schedule the workload of an ANN on the target MPSoC so that critical neurons are neatly distributed among the available processing elements. This reliability-oriented methodology exploits an integer linear programming solver to find the optimal solution. Experimental results are given for three different convolutional neural networks trained on MNIST, SVHN, and CIFAR-10. We carried out a comprehensive assessment on an open-source artificial intelligence-based RISC-V MPSoC. The results show the reliability improvements of the proposed methodology against the traditional scheduling

    Fast Nonblocking Persistence for Concurrent Data Structures

    Get PDF
    We present a fully lock-free variant of our recent Montage system for persistent data structures. The variant, nbMontage, adds persistence to almost any nonblocking concurrent structure without introducing significant overhead or blocking of any kind. Like its predecessor, nbMontage is buffered durably linearizable: it guarantees that the state recovered in the wake of a crash will represent a consistent prefix of pre-crash execution. Unlike its predecessor, nbMontage ensures wait-free progress of the persistence frontier, thereby bounding the number of recent updates that may be lost on a crash, and allowing a thread to force an update of the frontier (i.e., to perform a sync operation) without the risk of blocking. As an extra benefit, the helping mechanism employed by our wait-free sync significantly reduces its latency. Performance results for nonblocking queues, skip lists, trees, and hash tables rival custom data structures in the literature - dramatically faster than achieved with prior general-purpose systems, and generally within 50% of equivalent non-persistent structures placed in DRAM

    Combustion Characteristics of Hydrogen/Air Mixtures in a Plasma-Assisted Micro Combustor

    Get PDF
    This work performs an analysis of plasma-assisted non-premixed H2-air flames in Y-shaped micro combustors in the presence of field emission dielectric barrier discharge (FE-DBD) plasma actuators. The combustion, flow, and heat transfer characteristics are numerically investigated, and the effect of sinusoidal plasma discharges on combustion performance is examined at various equivalence ratios (φ). A coupled plasma and chemical kinetic model is implemented, using a zero-dimensional model based on the solution of the Boltzmann equation and the ZDPlasKin toolbox to compute net charges and radical generation rates. The estimated body forces, radical production rates, and power densities in the plasma regions are then coupled with hydrogen combustion in the microchannel. Plasma-assisted combustion reveals improvements in flame length and maximum gas temperature. The results demonstrate that FE-DBDs can enhance mixing and complete the combustion of unreacted fuel, preventing flame extinction. It is shown that even in cases of radical and thermal quenching, these plasma actuators are essential for stabilizing the flame

    A generator of numerically-tailored and high-throughput accelerators for batched GEMMs

    Get PDF
    We propose a hardware generator of GEMM accelerators. Our generator produces vendor-agnostic HDL describing highly customizable systolic arrays guided by accuracy and energy efficiency goals. The generated arrays have three main novel aspects. First, the accelerators handle a large variety of computer number formats using intermediate representations based on our Sign Scale Significand (S3) format. Second, the processing elements perform all intermediate dot-product arithmetic operations required by the GEMM kernel without any intermediate rounding, which makes it possible to deliver better energy efficiency than state-of-the-art approaches while offering more accuracy and reproducible results. Third, our accelerators feature the Half-Speed Sink Down (HSSD) mechanism, which maximizes the overlap of host-accelerator data transfers with GEMM computations.We evaluate our automatically generated designs in a cutting-edge setup composed of a POWER9 host, CAPI (Coherent Accelerator Processor Interface) link, and a Virtex Ultrascale Plus FPGA. Arrays can operate at the speed of the link and saturate it to reach a 13GB/s throughput. Our fine-grain customization approach allows to cover a wide range of accuracy versus efficiency scenarios and can reach 0.65GOps/s/W while producing 1024 accurate bits or 148.7GOps/s/W with 6 accurate bits. Our configurations achieve up to 1613GOps/s system performance and power efficiencies of up to 240GOps/s/W for the FPGA. This automatic generator is the first being able to produce such a variety of designs. We improve the single-precision energy efficiency of state-of-the-art FPGA GEMM accelerators by 1.86×.This work has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 955606 Marc Casas is supported by Grant RYC-2017-23269 funded by MCIN/AEI/ 10.13039/501100011033 and by “ESF Investing in your future”Peer ReviewedPostprint (author's final draft

    Simurgh: a fully decentralized and secure NVMM user space file system

    Get PDF
    The availability of non-volatile main memory (NVMM) has started a new era for storage systems and NVMM specific file systems can support extremely high data and metadata rates, which are required by many HPC and data-intensive applications. Scaling metadata performance within NVMM file systems is nevertheless often restricted by the Linux kernel storage stack, while simply moving metadata management to the user space can compromise security or flexibility. This paper introduces Simurgh, a hardware-assisted user space file system with decentralized metadata management that allows secure metadata updates from within user space. Simurgh guarantees consistency, durability, and ordering of updates without sacrificing scalability. Security is enforced by only allowing NVMM access from protected user space functions, which can be implemented through two proposed instructions. Comparisons with other NVMM file systems show that Simurgh improves metadata performance up to 18x and application performance up to 89% compared to the second-fastest file system.This work has been supported by the European Comission’s BigStorage project H2020-MSCA-ITN2014-642963. It is also supported by the Big Data in Atmospheric Physics (BINARY) project, funded by the Carl Zeiss Foundation under Grant No.: P2018-02-003.Peer ReviewedPostprint (author's final draft

    RISC-Vlim, a RISC-V Framework for Logic-in-Memory Architectures

    Get PDF
    Most modern CPU architectures are based on the von Neumann principle, where memory and processing units are separate entities. Although processing unit performance has improved over the years, memory capacity has not followed the same trend, creating a performance gap between them. This problem is known as the "memory wall" and severely limits the performance of a microprocessor. One of the most promising solutions is the "logic-in-memory" approach. It consists of merging memory and logic units, enabling data to be processed directly inside the memory itself. Here we propose an RISC-V framework that supports logic-in-memory operations. We substitute data memory with a circuit capable of storing data and of performing in-memory computation. The framework is based on a standard memory interface, so different logic-in-memory architectures can be inserted inside the microprocessor, based both on CMOS and emerging technologies. The main advantage of this framework is the possibility of comparing the performance of different logic-in-memory solutions on code execution. We demonstrate the effectiveness of the framework using a CMOS volatile memory and a memory based on a new emerging technology, racetrack logic. The results demonstrate an improvement in algorithm execution speed and a reduction in energy consumption
    corecore