100 research outputs found

    Instant restore after a media failure

    Full text link
    Media failures usually leave database systems unavailable for several hours until recovery is complete, especially in applications with large devices and high transaction volume. Previous work introduced a technique called single-pass restore, which increases restore bandwidth and thus substantially decreases time to repair. Instant restore goes further as it permits read/write access to any data on a device undergoing restore--even data not yet restored--by restoring individual data segments on demand. Thus, the restore process is guided primarily by the needs of applications, and the observed mean time to repair is effectively reduced from several hours to a few seconds. This paper presents an implementation and evaluation of instant restore. The technique is incrementally implemented on a system starting with the traditional ARIES design for logging and recovery. Experiments show that the transaction latency perceived after a media failure can be cut down to less than a second and that the overhead imposed by the technique on normal processing is minimal. The net effect is that a few "nines" of availability are added to the system using simple and low-overhead software techniques

    Compilation Optimizations to Enhance Resilience of Big Data Programs and Quantum Processors

    Get PDF
    Modern computers can experience a variety of transient errors due to the surrounding environment, known as soft faults. Although the frequency of these faults is low enough to not be noticeable on personal computers, they become a considerable concern during large-scale distributed computations or systems in more vulnerable environments like satellites. These faults occur as a bit flip of some value in a register, operation, or memory during execution. They surface as either program crashes, hangs, or silent data corruption (SDC), each of which can waste time, money, and resources. Hardware methods, such as shielding or error correcting memory (ECM), exist, though they can be difficult to implement, expensive, and may be limited to only protecting against errors in specific locations. Researchers have been exploring software detection and correction methods as an alternative, commonly trading either overhead in execution time or memory usage to protect against faults. Quantum computers, a relatively recent advancement in computing technology, experience similar errors on a much more severe scale. The errors are more frequent, costly, and difficult to detect and correct. Error correction algorithms like Shor’s code promise to completely remove errors, but they cannot be implemented on current noisy intermediate-scale quantum (NISQ) systems due to the low number of available qubits. Until the physical systems become large enough to support error correction, researchers instead have been studying other methods to reduce and compensate for errors. In this work, we present two methods for improving the resilience of classical processes, both single- and multi-threaded. We then introduce quantum computing and compare the nature of errors and correction methods to previous classical methods. We further discuss two designs for improving compilation of quantum circuits. One method, focused on quantum neural networks (QNNs), takes advantage of partial compilation to avoid recompiling the entire circuit each time. The other method is a new approach to compiling quantum circuits using graph neural networks (GNNs) to improve the resilience of quantum circuits and increase fidelity. By using GNNs with reinforcement learning, we can train a compiler to provide improved qubit allocation that improves the success rate of quantum circuits

    Leveraging Non-Volatile Memory in Modern Storage Management Architectures

    Get PDF
    Non-volatile memory technologies (NVM) introduce a novel class of devices that combine characteristics of both storage and main memory. Like storage, NVM is not only persistent, but also denser and cheaper than DRAM. Like DRAM, NVM is byte-addressable and has lower access latency. In recent years, NVM has gained a lot of attention both in academia and in the data management industry, with views ranging from skepticism to over excitement. Some critics claim that NVM is not cheap enough to replace flash-based SSDs nor is it fast enough to replace DRAM, while others see it simply as a storage device. Supporters of NVM have observed that its low latency and byte-addressability requires radical changes and a complete rewrite of storage management architectures. This thesis takes a moderate stance between these two views. We consider that, while NVM might not replace flash-based SSD or DRAM in the near future, it has the potential to reduce the gap between them. Furthermore, treating NVM as a regular storage media does not fully leverage its byte-addressability and low latency. On the other hand, completely redesigning systems to be NVM-centric is impractical. Proposals that attempt to leverage NVM to simplify storage management result in completely new architectures that face the same challenges that are already well-understood and addressed by the traditional architectures. Therefore, we take three common storage management architectures as a starting point, and propose incremental changes to enable them to better leverage NVM. First, in the context of log-structured merge-trees, we investigate the impact of storing data in NVM, and devise methods to enable small granularity accesses and NVM-aware caching policies. Second, in the context of B+Trees, we propose to extend the buffer pool and describe a technique based on the concept of optimistic consistency to handle corrupted pages in NVM. Third, we employ NVM to enable larger capacity and reduced costs in a index+log key-value store, and combine it with other techniques to build a system that achieves low tail latency. This thesis aims to describe and evaluate these techniques in order to enable storage management architectures to leverage NVM and achieve increased performance and lower costs, without major architectural changes.:1 Introduction 1.1 Non-Volatile Memory 1.2 Challenges 1.3 Non-Volatile Memory & Database Systems 1.4 Contributions and Outline 2 Background 2.1 Non-Volatile Memory 2.1.1 Types of NVM 2.1.2 Access Modes 2.1.3 Byte-addressability and Persistency 2.1.4 Performance 2.2 Related Work 2.3 Case Study: Persistent Tree Structures 2.3.1 Persistent Trees 2.3.2 Evaluation 3 Log-Structured Merge-Trees 3.1 LSM and NVM 3.2 LSM Architecture 3.2.1 LevelDB 3.3 Persistent Memory Environment 3.4 2Q Cache Policy for NVM 3.5 Evaluation 3.5.1 Write Performance 3.5.2 Read Performance 3.5.3 Mixed Workloads 3.6 Additional Case Study: RocksDB 3.6.1 Evaluation 4 B+Trees 4.1 B+Tree and NVM 4.1.1 Category #1: Buffer Extension 4.1.2 Category #2: DRAM Buffered Access 4.1.3 Category #3: Persistent Trees 4.2 Persistent Buffer Pool with Optimistic Consistency 4.2.1 Architecture and Assumptions 4.2.2 Embracing Corruption 4.3 Detecting Corruption 4.3.1 Embracing Corruption 4.4 Repairing Corruptions 4.5 Performance Evaluation and Expectations 4.5.1 Checksums Overhead 4.5.2 Runtime and Recovery 4.6 Discussion 5 Index+Log Key-Value Stores 5.1 The Case for Tail Latency 5.2 Goals and Overview 5.3 Execution Model 5.3.1 Reactive Systems and Actor Model 5.3.2 Message-Passing Communication 5.3.3 Cooperative Multitasking 5.4 Log-Structured Storage 5.5 Networking 5.6 Implementation Details 5.6.1 NVM Allocation on RStore 5.6.2 Log-Structured Storage and Indexing 5.6.3 Garbage Collection 5.6.4 Logging and Recovery 5.7 Systems Operations 5.8 Evaluation 5.8.1 Methodology 5.8.2 Environment 5.8.3 Other Systems 5.8.4 Throughput Scalability 5.8.5 Tail Latency 5.8.6 Scans 5.8.7 Memory Consumption 5.9 Related Work 6 Conclusion Bibliography A PiBenc

    Self-adaptivity of applications on network on chip multiprocessors: the case of fault-tolerant Kahn process networks

    Get PDF
    Technology scaling accompanied with higher operating frequencies and the ability to integrate more functionality in the same chip has been the driving force behind delivering higher performance computing systems at lower costs. Embedded computing systems, which have been riding the same wave of success, have evolved into complex architectures encompassing a high number of cores interconnected by an on-chip network (usually identified as Multiprocessor System-on-Chip). However these trends are hindered by issues that arise as technology scaling continues towards deep submicron scales. Firstly, growing complexity of these systems and the variability introduced by process technologies make it ever harder to perform a thorough optimization of the system at design time. Secondly, designers are faced with a reliability wall that emerges as age-related degradation reduces the lifetime of transistors, and as the probability of defects escaping post-manufacturing testing is increased. In this thesis, we take on these challenges within the context of streaming applications running in network-on-chip based parallel (not necessarily homogeneous) systems-on-chip that adopt the no-remote memory access model. In particular, this thesis tackles two main problems: (1) fault-aware online task remapping, (2) application-level self-adaptation for quality management. For the former, by viewing fault tolerance as a self-adaptation aspect, we adopt a cross-layer approach that aims at graceful performance degradation by addressing permanent faults in processing elements mostly at system-level, in particular by exploiting redundancy available in multi-core platforms. We propose an optimal solution based on an integer linear programming formulation (suitable for design time adoption) as well as heuristic-based solutions to be used at run-time. We assess the impact of our approach on the lifetime reliability. We propose two recovery schemes based on a checkpoint-and-rollback and a rollforward technique. For the latter, we propose two variants of a monitor-controller- adapter loop that adapts application-level parameters to meet performance goals. We demonstrate not only that fault tolerance and self-adaptivity can be achieved in embedded platforms, but also that it can be done without incurring large overheads. In addressing these problems, we present techniques which have been realized (depending on their characteristics) in the form of a design tool, a run-time library or a hardware core to be added to the basic architecture

    Platform for reliable computing on clusters using group communications

    Full text link
    Shared clusters represent an excellent platform for the execution of parallel applications given their low price/performance ratio and the presence of cluster infrastructure in many organisations. The focus of recent research efforts are on parallelism management, transport and efficient access to resources, and making clusters easy to use. In this thesis, we examine reliable parallel computing on clusters. The aim of this research is to demonstrate the feasibility of developing an operating system facility providing transport fault tolerance using existing, enhanced and newly built operating system services for supporting parallel applications. In particular, we use existing process duplication and process migration services, and synthesise a group communications facility for use in a transparent checkpointing facility. This research is carried out using the methods of experimental computer science. To provide a foundation for the synthesis of the group communications and checkpointing facilities, we survey and review related work in both fields. For group communications, we examine the V Distributed System, the x-kernel and Psync, the ISIS Toolkit, and Horus. We identify a need for services that consider the placement of processes on computers in the cluster. For Checkpointing, we examine Manetho, KeyKOS, libckpt, and Diskless Checkpointing. We observe the use of remote computer memories for storing checkpoints, and the use of copy-on-write mechanisms to reduce the time to create a checkpoint of a process. We propose a group communications facility providing two sets of services: user-oriented services and system-oriented services. User-oriented services provide transparency and target application. System-oriented services supplement the user-oriented services for supporting other operating systems services and do not provide transparency. Additional flexibility is achieved by providing delivery and ordering semantics independently. An operating system facility providing transparent checkpointing is synthesised using coordinated checkpointing. To ensure a consistent set of checkpoints are generated by the facility, instead of blindly blocking the processes of a parallel application, only non-deterministic events are blocked. This allows the processes of the parallel application to continue execution during the checkpoint operation. Checkpoints are created by adapting process duplication mechanisms, and checkpoint data is transferred to remote computer memories and disk for storage using the mechanisms of process migration. The services of the group communications facility are used to coordinate the checkpoint operation, and to transport checkpoint data to remote computer memories and disk. Both the group communications facility and the checkpointing facility have been implemented in the GENESIS cluster operating system and provide proof-of-concept. GENESIS uses a microkernel and client-server based operating system architecture, and is demonstrated to provide an appropriate environment for the development of these facilities. We design a number of experiments to test the performance of both the group communications facility and checkpointing facility, and to provide proof-of-performance. We present our approach to testing, the challenges raised in testing the facilities, and how we overcome them. For group communications, we examine the performance of a number of delivery semantics. Good speed-ups are observed and system-oriented group communication services are shown to provide significant performance advantages over user-oriented semantics in the presence of packet loss. For checkpointing, we examine the scalability of the facility given different levels of resource usage and a variable number of computers. Low overheads are observed for checkpointing a parallel application. It is made clear by this research that the microkernel and client-server based cluster operating system provide an ideal environment for the development of a high performance group communications facility and a transparent checkpointing facility for generating a platform for reliable parallel computing on clusters

    2013 Doctoral Workshop on Distributed Systems

    Get PDF
    The Doctoral Workshop on Distributed Systems was held at Les Plans-sur-Bex, Switzerland, from June 26-28, 2013. Ph.D. students from the Universities of Neuchâtel and Bern as well as the University of Applied Sciences of Fribourg presented their current research work and discussed recent research results. This technical report includes the extended abstracts of the talks given during the workshop

    TRANSACTION MANAGEMENT IN MULTI-CORE MAIN-MEMORY DATABASE SYSTEMS

    Get PDF
    Ph.DDOCTOR OF PHILOSOPH

    Fault tolerant software technology for distributed computing system

    Get PDF
    Issued as Monthly reports [nos. 1-23], Interim technical report, Technical guide books [nos. 1-2], and Final report, Project no. G-36-64

    Atomic Transfer for Distributed Systems

    Get PDF
    Building applications and information systems increasingly means dealing with concurrency and faults stemming from distribution of system components. Atomic transactions are a well-known method for transferring the responsibility for handling concurrency and faults from developers to the software\u27s execution environment, but incur considerable execution overhead. This dissertation investigates methods that shift some of the burden of concurrency control into the network layer, to reduce response times and increase throughput. It anticipates future programmable network devices, enabling customized high-performance network protocols. We propose Atomic Transfer (AT), a distributed algorithm to prevent race conditions due to messages crossing on a path of network switches. Switches check request messages for conflicts with response messages traveling in the opposite direction. Conflicting requests are dropped, obviating the request\u27s receiving host from detecting and handling the conflict. AT is designed to perform well under high data contention, as concurrency control effort is balanced across a network instead of being handled by the contended endpoint hosts themselves. We use AT as the basis for a new optimistic transactional cache consistency algorithm, supporting execution of atomic applications caching shared data. We then present a scalable refinement, allowing hierarchical consistent caches with predictable performance despite high data update rates. We give detailed I/O Automata models of our algorithms along with correctness proofs. We begin with a simplified model, assuming static network paths and no message loss, and then refine it to support dynamic network paths and safe handling of message loss. We present a trie-based data structure for accelerating conflict-checking on switches, with benchmarks suggesting the feasibility of our approach from a performance stand-point

    Sophisticated Batteryless Sensing

    Get PDF
    Wireless embedded sensing systems have revolutionized scientific, industrial, and consumer applications. Sensors have become a fixture in our daily lives, as well as the scientific and industrial communities by allowing continuous monitoring of people, wildlife, plants, buildings, roads and highways, pipelines, and countless other objects. Recently a new vision for sensing has emerged---known as the Internet-of-Things (IoT)---where trillions of devices invisibly sense, coordinate, and communicate to support our life and well being. However, the sheer scale of the IoT has presented serious problems for current sensing technologies---mainly, the unsustainable maintenance, ecological, and economic costs of recycling or disposing of trillions of batteries. This energy storage bottleneck has prevented massive deployments of tiny sensing devices at the edge of the IoT. This dissertation explores an alternative---leave the batteries behind, and harvest the energy required for sensing tasks from the environment the device is embedded in. These sensors can be made cheaper, smaller, and will last decades longer than their battery powered counterparts, making them a perfect fit for the requirements of the IoT. These sensors can be deployed where battery powered sensors cannot---embedded in concrete, shot into space, or even implanted in animals and people. However, these batteryless sensors may lose power at any point, with no warning, for unpredictable lengths of time. Programming, profiling, debugging, and building applications with these devices pose significant challenges. First, batteryless devices operate in unpredictable environments, where voltages vary and power failures can occur at any time---often devices are in failure for hours. Second, a device\u27s behavior effects the amount of energy they can harvest---meaning small changes in tasks can drastically change harvester efficiency. Third, the programming interfaces of batteryless devices are ill-defined and non- intuitive; most developers have trouble anticipating the problems inherent with an intermittent power supply. Finally, the lack of community, and a standard usable hardware platform have reduced the resources and prototyping ability of the developer. In this dissertation we present solutions to these challenges in the form of a tool for repeatable and realistic experimentation called Ekho, a reconfigurable hardware platform named Flicker, and a language and runtime for timely execution of intermittent programs called Mayfly
    corecore