482 research outputs found

    High-Performance Energy-Efficient and Reliable Design of Spin-Transfer Torque Magnetic Memory

    Get PDF
    In this dissertation new computing paradigms, architectures and design philosophy are proposed and evaluated for adopting the STT-MRAM technology as highly reliable, energy efficient and fast memory. For this purpose, a novel cross-layer framework from the cell-level all the way up to the system- and application-level has been developed. In these framework, the reliability issues are modeled accurately with appropriate fault models at different abstraction levels in order to analyze the overall failure rates of the entire memory and its Mean Time To Failure (MTTF) along with considering the temperature and process variation effects. Design-time, compile-time and run-time solutions have been provided to address the challenges associated with STT-MRAM. The effectiveness of the proposed solutions is demonstrated in extensive experiments that show significant improvements in comparison to state-of-the-art solutions, i.e. lower-power, higher-performance and more reliable STT-MRAM design

    Fault and Defect Tolerant Computer Architectures: Reliable Computing With Unreliable Devices

    Get PDF
    This research addresses design of a reliable computer from unreliable device technologies. A system architecture is developed for a fault and defect tolerant (FDT) computer. Trade-offs between different techniques are studied and yield and hardware cost models are developed. Fault and defect tolerant designs are created for the processor and the cache memory. Simulation results for the content-addressable memory (CAM)-based cache show 90% yield with device failure probabilities of 3 x 10(-6), three orders of magnitude better than non fault tolerant caches of the same size. The entire processor achieves 70% yield with device failure probabilities exceeding 10(-6). The required hardware redundancy is approximately 15 times that of a non-fault tolerant design. While larger than current FT designs, this architecture allows the use of devices much more likely to fail than silicon CMOS. As part of model development, an improved model is derived for NAND Multiplexing. The model is the first accurate model for small and medium amounts of redundancy. Previous models are extended to account for dependence between the inputs and produce more accurate results

    Proving the Absence of Microarchitectural Timing Channels

    Full text link
    Microarchitectural timing channels are a major threat to computer security. A set of OS mechanisms called time protection was recently proposed as a principled way of preventing information leakage through such channels and prototyped in the seL4 microkernel. We formalise time protection and the underlying hardware mechanisms in a way that allows linking them to the information-flow proofs that showed the absence of storage channels in seL4.Comment: Scott Buckley and Robert Sison were joint lead author

    Envisioning a Safety Island to Enable HPC Devices in Safety-Critical Domains

    Full text link
    HPC (High Performance Computing) devices increasingly become the only alternative to deliver the performance needed in safety-critical autonomous systems (e.g., autonomous cars, unmanned planes) due to deploying large and powerful multicores along with accelerators such as GPUs. However, the support that those HPC devices offer to realize safety-critical systems on top is heterogeneous. Safety islands have been devised to be coupled to HPC devices and complement them to meet the safety requirements of an increased set of applications, yet the variety of concepts and realizations is large. This paper presents our own concept of a safety island with two goals in mind: (1) offering a wide set of features to enable the broadest set of safety applications for each HPC device, and (2) being realized with open source components based on RISC-V ISA to ease its use and adoption. In particular, we present our safety island concept, the key features we foresee it should include, and its potential application beyond safety.Comment: White pape

    Static Probabilistic Timing Analysis in Presence of Faults

    Get PDF
    Accurate timing prediction for software execution is becoming a problem due to the increasing complexity of computer architecture, and the presence of mixed-criticality workloads. Probabilistic caches were proposed to set bounds to Worst Case Execution Time (WCET) estimates and help designers improve system resource usage. However, as technology scales down, system fault rates increase and timing behavior is affected. In this paper, we propose a Static Probabilistic Timing Analysis (SPTA) approach for caches with evict-on-miss random replacement policy using a state space modeling technique, with consideration of fault impacts on both timing analysis and task WCET. Different scenarios of transient and permanent faults are investigated. Results show that our proposed approach provides tight probabilistic WCET (pWCET) estimates and as fault rate increases, the timing behavior of the system can be affected significantly

    Attributes of fault-tolerant distributed file systems

    Get PDF
    Fault tolerance in distributed file systems will be investigated by analyzing recovery techniques and concepts implemented within the following models of distributed systems: pool-processor model and user-server model. The research presented provides an overview of fault tolerance characteristics and mechanisms within current implementations and summarizes future directions for fault tolerant distributed file systems

    On the Use of Migration to Stop Illicit Channels

    Get PDF
    Side and covert channels (referred to collectively as illicit channels) are an insidious affliction of high security systems brought about by the unwanted and unregulated sharing of state amongst processes. Illicit channels can be effectively broken through isolation, which limits the degree by which processes can interact. The drawback of using isolation as a general mitigation against illicit channels is that it can be very wasteful when employed naively. In particular, permanently isolating every tenant of a public cloud service to its own separate machine would completely undermine the economics of cloud computing, as it would remove the advantages of consolidation. On closer inspection, it transpires that only a subset of a tenant's activities are sufficiently security sensitive to merit strong isolation. Moreover, it is not generally necessary to maintain isolation indefinitely, nor is it given that isolation must always be procured at the machine level. This work builds on these observations by exploring a fine-grained and hierarchical model of isolation, where fractions of a machine can be isolated dynamically using migration. Using different units of isolation allows a system to isolate processes from each other with a minimum of over-allocated resources, and having a dynamic and reconfigurable model enables isolation to be procured on-demand. The model is then realised as an implemented framework that allows the fine-grained provisioning of units of computation, managing migrations at the core, virtual CPU, process group, process/container and virtual machine level. Use of this framework is demonstrated in detecting and mitigating a machine-wide covert channel, and in implementing a multi-level moving target defence. Finally, this work describes the extension of post-copy live migration mechanisms to allow temporary virtual machine migration. This adds the ability to isolate a virtual machine on a short term basis, which subsequently allows migrations to happen at a higher frequency and with fewer redundant memory transfers, and also creates the opportunity of time-sharing a particular physical machine's features amongst a set of tenants' virtual machines

    Static Probabilistic Timing Analysis for Real-Time Embedded Systems in Presence of Faults

    Get PDF
    RÉSUMÉ Une mémoire cache est le lien entre le processeur et la mémoire principale. Elle permet de réduire considérablement les temps d’accès aux blocs de mémoire dans un système embarqué temps-réel et critique (CRTES), ce qui influence énormément son comportement temporel. Des caches à accès aléatoire—caches avec une politique de remplacement aléatoire—ont été proposées dans le but d’améliorer les estimations du comportement temporel des CRTES, et cela en diminuant les cas pathologiques. Les Measurement Based Probabilistic Timing Analysis (MBPTA) et Static Probabilistic Timing Analysis (SPTA) sont deux méthodes qui ciblent à estimer le pire temps d’exécution (Worst Case Execution Time probabiliste - pWCET) d’une façon probabiliste et sécuritaire pour les caches aléatoires. À travers cette dissertation, on présente des travaux de recherche concernant l’estimation temporelle basée sur la méthode SPTA. L’état de l’art sur les méthodologies SPTA fournissent des estimations sécuritaires et strictes. En revanche, au vu de la réduction d’échelle des technologies des semiconducteurs utilisés pour la mise en oeuvre des composants faisant partie des CRETS, les caches sur puce sont de plus en plus prédisposés aux pannes. Par conséquent, nous avons développé des méthodologies SPTA pour l’estimation des pWCETs en présence de pannes. Nous avons effectué également des évaluations de l’impact de ces fautes sur les comportements temporels. Afin d’examiner les pannes, nous avons modélisé dans un premier temps les pannes transitoires et permanentes. Une panne transitoire représente un changement d’état temporaire. Le système peut ainsi être restauré en utilisant des techniques de détection et de correction des pannes. D’un autre côté, une panne permanente introduit un changement permanent. Elle persiste après son apparition et affecte en conséquence le comportement général du système. Nous avons alors proposé une méthode basée sur les chaînes de Markov afin de modéliser les états de disposition de la mémoire. Pour chaque accès à un bloc de mémoire, le changement de l’état est calculé en utilisant une matrice de transition, tout en tenant compte des impacts des fautes transitoires. Nous avons également utilisé différents types de modèles de la chaîne de Markov pour représenter le système ayant subi un nombres différent de pannes permanentes. Les expériences montrent que notre méthode SPTA assure des résultats précis en présence des pannes transitoires et permanentes.----------ABSTRACT : A cache is typically the bridge between a processor and its main memory. It significantly reduces the access latencies to memory blocks and its timing behavior. Random caches—caches with a random replacement policy—have been proposed to improve timing behavior estimates in critical real-time embedded systems (CRTESs) by reducing pathological cases due to systematic cache misses. Measurement Based Probabilistic Timing Analysis (MBPTA)and Static Probabilistic Timing Analysis (SPTA) aim at providing safe probabilistic Worst Case Execution Time (pWCET) estimates for random caches. In this dissertation, we present research work on timing estimation based on SPTA. State-of-the-art SPTA methodologies produce safe and tight pWCET estimates. However, as semiconductor technology scales down, CRTES components—especially their on-chip caches—become prone to faults. Consequently,we developed SPTA methodologies to estimate pWCETs in the presence of faults, and evaluated the impacts of faults on timing behaviors. To investigate faults, we first defined transient and permanent fault models. A transient fault represents a temporary change of state. The system with transient faults can be recovered using fault detection and correction techniques. A permanent fault represents a permanent change of state. It persists after its occurrence and affects the system’s behavior afterwards. Additionally, we proposed a Markov chain method to model memory layout states. For each memory block access, the state changes are calculated using a transition matrix. The transient fault impacts were integrated into the transition matrix computation, and we used different groups of Markov chain models to represent the system with different number of permanent faults. Experiments showed that our SPTA method provided accurate results in the presence of both transient and permanent faults

    The construction of recoverable multi-level systems

    Get PDF
    PhD ThesisSystems structures and data structures which make possible the state restoration of user objects, are described in this thesis. Recovery is linked with types, which suggests making a distinction between recoverable and unrecoverable types. For convenience, recovery is discussed in terms of recovery blocks as developed at the University of Newcastle upon Tyne. Recovery is taken to mean restoring the values of recoverable types. Recoverable multi-level systems are considered. On the one hand levels in such systems can be backed out. On the other hand these levels provide explicit recovery for new types they introduce, and so can be called on to restore states of objects used in higher levels. The concepts and issues are discussed and explained; mechanisms and techniques for building such systems are presented. Recovery techniques for complex global data structures and techniques to maintain consistency at any time, even when recovery is impossible such as after a crash, are described and compared. Many of the presented techniques are employed in an implemented recoverable two-level system, with a recoverable filing system. This two-level system is described in detail. It is argued that in order to implement recoverability in multi-level systems with efficiency and flexibility, the interfaces of the system should provide both recoverable and unrecoverable types. It is also shown that the way in which complex data structures are updated is of major importance if recovery is to be provided in a "reasonably" efficient way and consistency is to be guaranteed after a crash.Netherlands Organisation for the Advancement of Pure Research
    • …
    corecore