335 research outputs found

    Towards trustworthy computing on untrustworthy hardware

    Get PDF
    Historically, hardware was thought to be inherently secure and trusted due to its obscurity and the isolated nature of its design and manufacturing. In the last two decades, however, hardware trust and security have emerged as pressing issues. Modern day hardware is surrounded by threats manifested mainly in undesired modifications by untrusted parties in its supply chain, unauthorized and pirated selling, injected faults, and system and microarchitectural level attacks. These threats, if realized, are expected to push hardware to abnormal and unexpected behaviour causing real-life damage and significantly undermining our trust in the electronic and computing systems we use in our daily lives and in safety critical applications. A large number of detective and preventive countermeasures have been proposed in literature. It is a fact, however, that our knowledge of potential consequences to real-life threats to hardware trust is lacking given the limited number of real-life reports and the plethora of ways in which hardware trust could be undermined. With this in mind, run-time monitoring of hardware combined with active mitigation of attacks, referred to as trustworthy computing on untrustworthy hardware, is proposed as the last line of defence. This last line of defence allows us to face the issue of live hardware mistrust rather than turning a blind eye to it or being helpless once it occurs. This thesis proposes three different frameworks towards trustworthy computing on untrustworthy hardware. The presented frameworks are adaptable to different applications, independent of the design of the monitored elements, based on autonomous security elements, and are computationally lightweight. The first framework is concerned with explicit violations and breaches of trust at run-time, with an untrustworthy on-chip communication interconnect presented as a potential offender. The framework is based on the guiding principles of component guarding, data tagging, and event verification. The second framework targets hardware elements with inherently variable and unpredictable operational latency and proposes a machine-learning based characterization of these latencies to infer undesired latency extensions or denial of service attacks. The framework is implemented on a DDR3 DRAM after showing its vulnerability to obscured latency extension attacks. The third framework studies the possibility of the deployment of untrustworthy hardware elements in the analog front end, and the consequent integrity issues that might arise at the analog-digital boundary of system on chips. The framework uses machine learning methods and the unique temporal and arithmetic features of signals at this boundary to monitor their integrity and assess their trust level

    GPU devices for safety-critical systems: a survey

    Get PDF
    Graphics Processing Unit (GPU) devices and their associated software programming languages and frameworks can deliver the computing performance required to facilitate the development of next-generation high-performance safety-critical systems such as autonomous driving systems. However, the integration of complex, parallel, and computationally demanding software functions with different safety-criticality levels on GPU devices with shared hardware resources contributes to several safety certification challenges. This survey categorizes and provides an overview of research contributions that address GPU devices’ random hardware failures, systematic failures, and independence of execution.This work has been partially supported by the European Research Council with Horizon 2020 (grant agreements No. 772773 and 871465), the Spanish Ministry of Science and Innovation under grant PID2019-107255GB, the HiPEAC Network of Excellence and the Basque Government under grant KK-2019-00035. The Spanish Ministry of Economy and Competitiveness has also partially supported Leonidas Kosmidis with a Juan de la Cierva Incorporación postdoctoral fellowship (FJCI-2020- 045931-I).Peer ReviewedPostprint (author's final draft

    Low Power Memory/Memristor Devices and Systems

    Get PDF
    This reprint focusses on achieving low-power computation using memristive devices. The topic was designed as a convenient reference point: it contains a mix of techniques starting from the fundamental manufacturing of memristive devices all the way to applications such as physically unclonable functions, and also covers perspectives on, e.g., in-memory computing, which is inextricably linked with emerging memory devices such as memristors. Finally, the reprint contains a few articles representing how other communities (from typical CMOS design to photonics) are fighting on their own fronts in the quest towards low-power computation, as a comparison with the memristor literature. We hope that readers will enjoy discovering the articles within

    Signal and Information Processing Methods for Embedded Robotic Tactile Sensing Systems

    Get PDF
    The human skin has several sensors with different properties and responses that are able to detect stimuli resulting from mechanical stimulations. Pressure sensors are the most important type of receptors for the exploration and manipulation of objects. In the last decades, smart tactile sensing based on different sensing techniques have been developed as their application in robotics and prosthetics is considered of huge interest, mainly driven by the prospect of autonomous and intelligent robots that can interact with the environment. However, regarding object properties estimation on robots, hardness detection is still a major limitation due to the lack of techniques to estimate it. Furthermore, finding processing methods that can interpret the measured information from multiple sensors and extract relevant information is a Challenging task. Moreover, embedding processing methods and machine learning algorithms in robotic applications to extract meaningful information such as object properties from tactile data is an ongoing challenge, which is controlled by the device constraints (power constraint, memory constraints, etc.), the computational complexity of the processing and machine learning algorithms, the application requirements (real-time operations, high prediction performance). In this dissertation, we focus on the design and implementation of pre-processing methods and machine learning algorithms to handle the aforementioned challenges for a tactile sensing system in robotic application. First, we propose a tactile sensing system for robotic application. Then we present efficient preprocessing and feature extraction methods for our tactile sensors. Then we propose a learning strategy to reduce the computational cost of our processing unit in object classification using sensorized Baxter robot. Finally, we present a real-time robotic tactile sensing system for hardness classification on a resource-constrained devices. The first study represents a further assessment of the sensing system that is based on the PVDF sensors and the interface electronics developed in our lab. In particular, first, it presents the development of a skin patch (multilayer structure) that allows us to use the sensors in several applications such as robotic hand/grippers. Second, it shows the characterization of the developed skin patch. Third, it validates the sensing system. Moreover, we designed a filter to remove noise and detect touch. The experimental assessment demonstrated that the developed skin patch and the interface electronics indeed can detect different touch patterns and stimulus waveforms. Moreover, the results of the experiments defined the frequency range of interest and the response of the system to realistic interactions with the sensing system to grasp and release events. In the next study, we presented an easy integration of our tactile sensing system into Baxter gripper. Computationally efficient pre-processing techniques were designed to filter the signal and extract relevant information from multiple sensor signals, in addition to feature extraction methods. These processing methods aim in turn to reduce also the computational complexity of machine learning algorithms utilized for object classification. The proposed system and processing strategy were evaluated on object classification application by integrating our system into the gripper and we collected data by grasping multiple objects. We further proposed a learning strategy to accomplish a trade-off between the generalization accuracy and the computational cost of the whole processing unit. The proposed pre-processing and feature extraction techniques together with the learning strategy have led to models with extremely low complexity and very high generalization accuracy. Moreover, the support vector machine achieved the best trade-off between accuracy and computational cost on tactile data from our sensors. Finally, we presented the development and implementation on the edge of a real–time tactile sensing system for hardness classification on Baxter robot based on machine and deep learning algorithms. We developed and implemented in plain C a set of functions that provide the fundamental layer functionalities of the Machine learning and Deep Learning models (ML and DL), along with the pre–processing methods to extract the features and normalize the data. The models can be deployed to any device that supports C code since it does not rely on any of the existing libraries. Shallow ML/DL algorithms for the deployment on resource–constrained devices are designed. To evaluate our work, we collected data by grasping objects of different hardness and shape. Two classification problems were addressed: 5 levels of hardness classified on the same objects’ shape, and 5 levels of hardness classified on two different objects’ shape. Furthermore, optimization techniques were employed. The models and pre–processing were implemented on a resource constrained device, where we assessed the performance of the system in terms of accuracy, memory footprint, time latency, and energy consumption. We achieved for both classification problems a real-time inference (< 0.08 ms), low power consumption (i.e., 3.35 μJ), extremely small models (i.e., 1576 Byte), and high accuracy (above 98%)

    Memory Safety Acceleration on RISC-V for C Programming Language

    Full text link
    Memory corruption vulnerabilities can lead to memory attacks. Three of the top ten most dangerous weaknesses in computer security are memory-related. Memory attack is one of a computer system’s oldest but everlasting problems. Companies and the government lost billions of dollars due to memory security breaches. Memory safety is paramount to securing memory systems. Pointer-based memory safety protection has been shown as a promising solution covering both out-of-bounds and use-after-free errors. However, pointer-based memory safety relies on additional information (metadata) to check validity when a pointer is dereferenced. Such operations on the metadata introduce significant performance overhead to the system. Existing hardware/software implementations are primarily limited to proprietary closed-source microprocessors, simulation-only studies, or require changes to the input source code. In order to provide the need for memory security, we created a memory-safe RISC-V platform with low-performance overhead. In this thesis, a novel hardware/software co-design methodology consisting of a RISC-V based processor is extended with new instructions and microarchitecture enhancements, enabling complete memory safety in the C programming language and faster memory safety checks. Furthermore, a compiler is instrumented to provide security operations considering the changes to the processor. Moreover, a design exploration framework is proposed to provide an in-depth search for optimal hardware/software configuration for application-specific workloads regarding performance overhead, security coverage, area cost, and critical path latency. The entire system is realized by enhancing a RISC-V Rocket-chip system-on-chip (SoC). The resultant processor SoC is implemented on an FPGA and evaluated with applications from SPEC 2006 (for generic applications), MiBench (for embedded applications), and Olden benchmark suites for performance. The system, including the RISC-V CHISEL, compiler, profiling and analysis tool-chain, is fully available and open-source to the public

    Co-designing reliability and performance for datacenter memory

    Get PDF
    Memory is one of the key components that affects reliability and performance of datacenter servers. Memory in today’s servers is organized and shared in several ways to provide the most performant and efficient access to data. For example, cache hierarchy in multi-core chips to reduce access latency, non-uniform memory access (NUMA) in multi-socket servers to improve scalability, disaggregation to increase memory capacity. In all these organizations, hardware coherence protocols are used to maintain memory consistency of this shared memory and implicitly move data to the requesting cores. This thesis aims to provide fault-tolerance against newer models of failure in the organization of memory in datacenter servers. While designing for improved reliability, this thesis explores solutions that can also enhance performance of applications. The solutions build over modern coherence protocols to achieve these properties. First, we observe that DRAM memory system failure rates have increased, demanding stronger forms of memory reliability. To combat this, the thesis proposes Dvé, a hardware driven replication mechanism where data blocks are replicated across two different memory controllers in a cache-coherent NUMA system. Data blocks are accompanied by a code with strong error detection capabilities so that when an error is detected, correction is performed using the replica. Dvé’s organization offers two independent points of access to data which enables: (a) strong error correction that can recover from a range of faults affecting any of the components in the memory and (b) higher performance by providing another nearer point of memory access. Dvé’s coherent replication keeps the replicas in sync for reliability and also provides coherent access to read replicas during fault-free operation for improved performance. Dvé can flexibly provide these benefits on-demand at runtime. Next, we observe that the coherence protocol itself requires to be hardened against failures. Memory in datacenter servers is being disaggregated from the compute servers into dedicated memory servers, driven by standards like CXL. CXL specifies the coherence protocol semantics for compute servers to access and cache data from a shared region in the disaggregated memory. However, the CXL specification lacks the requisite level of fault-tolerance necessary to operate at an inter-server scale within the datacenter. Compute servers can fail or be unresponsive in the datacenter and therefore, it is important that the coherence protocol remain available in the presence of such failures. The thesis proposes Āpta, a CXL-based, shared disaggregated memory system for keeping the cached data consistent without compromising availability in the face of compute server failures. Āpta architects a high-performance fault-tolerant object-granular memory server that significantly improves performance for stateless function-as-a-service (FaaS) datacenter applications

    L2C2: Last-level compressed-contents non-volatile cache and a procedure to forecast performance and lifetime

    Get PDF
    Several emerging non-volatile (NV) memory technologies are rising as interesting alternatives to build the Last-Level Cache (LLC). Their advantages, compared to SRAM memory, are higher density and lower static power, but write operations wear out the bitcells to the point of eventually losing their storage capacity. In this context, this paper presents a novel LLC organization designed to extend the lifetime of the NV data array and a procedure to forecast in detail the capacity and performance of such an NV-LLC over its lifetime. From a methodological point of view, although different approaches are used in the literature to analyze the degradation of an NV-LLC, none of them allows to study in detail its temporal evolution. In this sense, this work proposes a forecasting procedure that combines detailed simulation and prediction, allowing an accurate analysis of the impact of different cache control policies and mechanisms (replacement, wear-leveling, compression, etc.) on the temporal evolution of the indices of interest, such as the effective capacity of the NV-LLC or the system IPC. We also introduce L2C2, a LLC design intended for implementation in NV memory technology that combines fault tolerance, compression, and internal write wear leveling for the first time. Compression is not used to store more blocks and increase the hit rate, but to reduce the write rate and increase the lifetime during which the cache supports near-peak performance. In addition, to support byte loss without performance drop, L2C2 inherently allows N redundant bytes to be added to each cache entry. Thus, L2C2+N, the endurance-scaled version of L2C2, allows balancing the cost of redundant capacity with the benefit of longer lifetime. For instance, as a use case, we have implemented the L2C2 cache with STT-RAM technology. It has affordable hardware overheads compared to that of a baseline NV-LLC without compression in terms of area, latency and energy consumption, and increases up to 6-37 times the time in which 50% of the effective capacity is degraded, depending on the variability in the manufacturing process. Compared to L2C2, L2C2+6 which adds 6 bytes of redundant capacity per entry, that means 9.1% of storage overhead, can increase up to 1.4-4.3 times the time in which the system gets its initial peak performance degraded

    It is too hot in here! A performance, energy and heat aware scheduler for Asymmetric multiprocessing processors in embedded systems.

    Get PDF
    Modern architecture present in self-power devices such as mobiles or tablet computers proposes the use of asymmetric processors that allow either energy-efficient or performant computation on the same SoC. For energy efficiency and performance consideration, the asymmetry resides in differences in CPU micro-architecture design and results in diverging raw computing capability. Other components such as the processor memory subsystem also show differences resulting in different memory transaction timing. Moreover, based on a bus-snoop protocol, cache coherency between processors comes with a peculiarity in memory latency depending on the processors operating frequencies. All these differences come with challenging decisions on both application schedulability and processor operating frequencies. In addition, because of the small form factor of such embedded systems, these devices generally cannot afford active cooling systems. Therefore thermal mitigation relies on dynamic software solutions. Current operating systems for embedded systems such as Linux or Android do not consider all these particularities. As such, they often fail to satisfy user expectations of a powerful device with long battery life. To remedy this situation, this thesis proposes a unified approach to deliver high-performance and energy-efficiency computation in each of its flavours, considering the memory subsystem and all computation units available in the system. Performance is maximized even when the device is under heavy thermal constraints. The proposed unified solution is based on accurate models targeting both performance and thermal behaviour and resides at the operating systems kernel level to manage all running applications in a global manner. Particularly, the performance model considers both the computation part and also the memory subsystem of symmetric or asymmetric processors present in embedded devices. The thermal model relies on the accurate physical thermal properties of the device. Using these models, application schedulability and processor frequency scaling decisions to either maximize performance or energy efficiency within a thermal budget are extensively studied. To cover a large range of application behaviour, both models are built and designed using a generative workload that considers fine-grain details of the underlying microarchitecture of the SoC. Therefore, this approach can be derived and applied to multiple devices with little effort. Extended evaluation on real-world benchmarks for high performance and general computing, as well as common applications targeting the mobile and tablet market, show the accuracy and completeness of models used in this unified approach to deliver high performance and energy efficiency under high thermal constraints for embedded devices

    Impact of thermal material properties and local ion concentration on Ti/HfOx-based analog devices

    Get PDF
    The realization of adaptive oxides for neuromorphic computing hinges on repeatable and predicable analog changes of electrical resistance, which is fundamentally controlled by the materials in and around the device. This work focuses on filamentary memristors which exhibit a non-volatile change in resistance by modulating the concentration of oxygen vacancies within a small (filamentary) region of an otherwise insulating oxide layer in a metal-insulator-metal (MIM) stack. Under an applied electric field, these devices experience localized temperature rises over 1000 K on picosecond timescales, with drift, diffusion, and thermophoresis causing the migration of oxygen ions and oxygen vacancies. All three of these mechanisms have a strong dependence on temperature. Therefore, the management of the thermal field is crucial to successful implementation of these materials and devices. This dissertation independently establishes the impact of the substrate and electrode thermal conductivity both experimentally and computationally. For biologically realistic pulse widths, low substrate thermal conductivities led to increased resistance changes in RRAM devices. Furthermore, scanning thermal microscopy was used to compare the in-situ temperature rise of the top electrode directly above the filament with the estimated value from the model. This established a method to estimate the filament temperature during biasing with an accuracy ~30 K. Computational results demonstrated the temperature of the capping layer (between the oxide and the top electrode) had the greatest impact on the resistance change. Thus, a low thermal conductivity capping layer led to significantly higher resistance changes. Further work exploring the importance of the capping layer revealed that slightly higher initial oxygen concentrations (~2 - 3%) caused larger resistance changes compared to lower concentrations. In summary, this work establishes the importance of the thermal properties not only in contact with the filament, but also far away (substrate and electrodes) and establishes the importance of understanding the interplay between the filament and the capping layer to further improve the analog resistance change of filamentary RRAMs.Ph.D
    • …
    corecore