7 research outputs found

    Platform for reliable computing on clusters using group communications

    Full text link
    Shared clusters represent an excellent platform for the execution of parallel applications given their low price/performance ratio and the presence of cluster infrastructure in many organisations. The focus of recent research efforts are on parallelism management, transport and efficient access to resources, and making clusters easy to use. In this thesis, we examine reliable parallel computing on clusters. The aim of this research is to demonstrate the feasibility of developing an operating system facility providing transport fault tolerance using existing, enhanced and newly built operating system services for supporting parallel applications. In particular, we use existing process duplication and process migration services, and synthesise a group communications facility for use in a transparent checkpointing facility. This research is carried out using the methods of experimental computer science. To provide a foundation for the synthesis of the group communications and checkpointing facilities, we survey and review related work in both fields. For group communications, we examine the V Distributed System, the x-kernel and Psync, the ISIS Toolkit, and Horus. We identify a need for services that consider the placement of processes on computers in the cluster. For Checkpointing, we examine Manetho, KeyKOS, libckpt, and Diskless Checkpointing. We observe the use of remote computer memories for storing checkpoints, and the use of copy-on-write mechanisms to reduce the time to create a checkpoint of a process. We propose a group communications facility providing two sets of services: user-oriented services and system-oriented services. User-oriented services provide transparency and target application. System-oriented services supplement the user-oriented services for supporting other operating systems services and do not provide transparency. Additional flexibility is achieved by providing delivery and ordering semantics independently. An operating system facility providing transparent checkpointing is synthesised using coordinated checkpointing. To ensure a consistent set of checkpoints are generated by the facility, instead of blindly blocking the processes of a parallel application, only non-deterministic events are blocked. This allows the processes of the parallel application to continue execution during the checkpoint operation. Checkpoints are created by adapting process duplication mechanisms, and checkpoint data is transferred to remote computer memories and disk for storage using the mechanisms of process migration. The services of the group communications facility are used to coordinate the checkpoint operation, and to transport checkpoint data to remote computer memories and disk. Both the group communications facility and the checkpointing facility have been implemented in the GENESIS cluster operating system and provide proof-of-concept. GENESIS uses a microkernel and client-server based operating system architecture, and is demonstrated to provide an appropriate environment for the development of these facilities. We design a number of experiments to test the performance of both the group communications facility and checkpointing facility, and to provide proof-of-performance. We present our approach to testing, the challenges raised in testing the facilities, and how we overcome them. For group communications, we examine the performance of a number of delivery semantics. Good speed-ups are observed and system-oriented group communication services are shown to provide significant performance advantages over user-oriented semantics in the presence of packet loss. For checkpointing, we examine the scalability of the facility given different levels of resource usage and a variable number of computers. Low overheads are observed for checkpointing a parallel application. It is made clear by this research that the microkernel and client-server based cluster operating system provide an ideal environment for the development of a high performance group communications facility and a transparent checkpointing facility for generating a platform for reliable parallel computing on clusters

    Principled Elimination of Microarchitectural Timing Channels through Operating-System Enforced Time Protection

    Full text link
    Microarchitectural timing channels exploit resource contentions on a shared hardware platform to cause information leakage through timing variance. These channels threaten system security by providing unauthorised information flow in violation of the system’s security policy. Present operating systems lack the means for systematic prevention of such channels. To address this problem, we propose time protection as an operating system (OS) abstraction, which provides mandatory temporal isolation analogous to the spatial isolation provided by the established memory protection abstraction. In order to fully understand microarchitectural timing channels, we first study all published microarchitectural timing attacks, their countermeasures and analyse the underlying causes. Then we define two application scenarios, a confinement scenario and a cloud scenario, which between them represent a large class of security-critical use cases, and aim to develop a solution that supports both. Our study identifies competition for limited hardware resources as the underlying cause for microarchitectural timing channels. From this we derive the requirement that proper isolation requires that all shared resources must be partitioned, either spatially or temporally (time-shared). We then analyse a number of recent processors across two instruction-set architectures (ISAs), x86 and Arm, for their support for such partitioning. We discover that all examined processors exhibit hardware state that cannot be partitioned by architected means, meaning that they all have uncloseable channels.We define the requirements hardware must satisfy for timing-channel prevention, and propose an augmented ISA as a new, security-oriented hardware-software contract. Assuming conforming hardware, we then define the requirements that OS-provided time protection must satisfy. We propose a concrete design of time protection, consisting of a set of policy-free mechanisms, and present an implementation in the seL4 microkernel. We evaluate the efficacy and efficiency of the implementation, and show that it is highly effective at closing timing channels, to the degree supported by the underlying hardware. We also find that the performance overheads are small to negligible. We can conclude that principled prevention of timing channels is possible though mandatory, black-box enforcement by the OS, subject to hardware manufacturers providing mechanisms for scrubbing all shared microarchitectural state

    Systemunterstützung für moderne Speichertechnologien

    Get PDF
    Trust and scalability are the two significant factors which impede the dissemination of clouds. The possibility of privileged access to customer data by a cloud provider limits the usage of clouds for processing security-sensitive data. Low latency cloud services rely on in-memory computations, and thus, are limited by several characteristics of Dynamic RAM (DRAM) such as capacity, density, energy consumption, for example. Two technological areas address these factors. Mainstream server platforms, such as Intel Software Guard eXtensions (SGX) und AMD Secure Encrypted Virtualisation (SEV) offer extensions for trusted execution in untrusted environments. Various technologies of Non-Volatile RAM (NV-RAM) have better capacity and density compared to DRAM and thus can be considered as DRAM alternatives in the future. However, these technologies and extensions require new programming approaches and system support since they add features to the system architecture: new system components (Intel SGX) and data persistence (NV-RAM). This thesis is devoted to the programming and architectural aspects of persistent and trusted systems. For trusted systems, an in-depth analysis of new architectural extensions was performed. A novel framework named EActors and a database engine named STANlite were developed to effectively use the capabilities of trusted~execution. For persistent systems, an in-depth analysis of prospective memory technologies, their features and the possible impact on system architecture was performed. A new persistence model, called the hypervisor-based model of persistence, was developed and evaluated by the NV-Hypervisor. This offers transparent persistence for legacy and proprietary software, and supports virtualisation of persistent memory.Vertrauenswürdigkeit und Skalierbarkeit sind die beiden maßgeblichen Faktoren, die die Verbreitung von Clouds behindern. Die Möglichkeit privilegierter Zugriffe auf Kundendaten durch einen Cloudanbieter schränkt die Nutzung von Clouds bei der Verarbeitung von sicherheitskritischen und vertraulichen Informationen ein. Clouddienste mit niedriger Latenz erfordern die Durchführungen von Berechnungen im Hauptspeicher und sind daher an Charakteristika von Dynamic RAM (DRAM) wie Kapazität, Dichte, Energieverbrauch und andere Aspekte gebunden. Zwei technologische Bereiche befassen sich mit diesen Faktoren: Etablierte Server Plattformen wie Intel Software Guard eXtensions (SGX) und AMD Secure Encrypted Virtualisation (SEV) stellen Erweiterungen für vertrauenswürdige Ausführung in nicht vertrauenswürdigen Umgebungen bereit. Verschiedene Technologien von nicht flüchtigem Speicher bieten bessere Kapazität und Speicherdichte verglichen mit DRAM, und können daher in Zukunft als Alternative zu DRAM herangezogen werden. Jedoch benötigen diese Technologien und Erweiterungen neuartige Ansätze und Systemunterstützung bei der Programmierung, da diese der Systemarchitektur neue Funktionalität hinzufügen: Systemkomponenten (Intel SGX) und Persistenz (nicht-flüchtiger Speicher). Diese Dissertation widmet sich der Programmierung und den Architekturaspekten von persistenten und vertrauenswürdigen Systemen. Für vertrauenswürdige Systeme wurde eine detaillierte Analyse der neuen Architekturerweiterungen durchgeführt. Außerdem wurden das neuartige EActors Framework und die STANlite Datenbank entwickelt, um die neuen Möglichkeiten von vertrauenswürdiger Ausführung effektiv zu nutzen. Darüber hinaus wurde für persistente Systeme eine detaillierte Analyse zukünftiger Speichertechnologien, deren Merkmale und mögliche Auswirkungen auf die Systemarchitektur durchgeführt. Ferner wurde das neue Hypervisor-basierte Persistenzmodell entwickelt und mittels NV-Hypervisor ausgewertet, welches transparente Persistenz für alte und proprietäre Software, sowie Virtualisierung von persistentem Speicher ermöglicht

    Object Oriented Transaction Processing in the KeyKOS Microkernel

    No full text
    Three major technological directions in computer technology are transaction processing, object orientation, and microkernel operating systems. The KeyKOS operating system and the KeyTXF transaction processing system combine all three of these technologies. The design of KeyKOS directly provides operating system level objects on a microkernel base. In order to maintain the integrity of these objects, KeyKOS takes periodic checkpoints of the entire system. In addition, KeyKOS provides facilities for transaction processing which achieve very high transaction rates. Object oriented technology facilitates construction and reuse of transaction applications. This paper describes how these ideas are combined in the KeyKOS system. 1 Introduction This paper examines the structure of an application environment that combines three technologies: transaction processing, object orientation, and microkernel operating systems. The KeyTXF transaction processing system, which runs on the KeyKOS microkern..

    Composing and Decomposing OS Abstractions

    Get PDF
    Operating systems (OSes) provide a set of abstractions through which hardware resources are accessed. Abstractions that are closer to hardware offer the greatest opportunity for performance, whereas higher-level abstractions may sacrifice performance but are typically more portable and potentially more secure and robust. The abstractions chosen byOS designs impose a set of trade-offs that will not be well-suited for all applications. In this dissertation, we argue the following thesis: Supporting novel hardware such as non-volatile RAM (NVRAM) and new abstractions like fine-grained isolation while maintaining efficiency, usability, and security goals, requires simultaneous access to both high-level OS abstractions and compatible access to their low-level decompositions. We support this thesis by offering two new abstractions, PTx and light-weight-contexts (lwCs), as well as the null-Kernel, a new OS architecture. PTx is a new high-level abstraction for persistence built on top of NVRAM, a new form of persistent byte addressable memory, whereas lwCs are a new OS abstraction that enables fine-grained intra-process isolation, snapshots and reference monitoring. Due to the efficiency requirements of both PTx and lwCs, both abstractions required access to low-level decompositions of higher-level abstractions, while interoperabilityrequirements dictated that both low and high-level abstractions were exposed simultaneously. The null-Kernel is an OS architecture that enabled the simultaneous exposure of multiple abstractions for the same underlying hardware in a safe way, which, if adopted, would accelerate the development and deployment of abstractions such as PTx and lwCs
    corecore