1,758 research outputs found

    CATS: linearizability and partition tolerance in scalable and self-organizing key-value stores

    Get PDF
    Distributed key-value stores provide scalable, fault-tolerant, and self-organizing storage services, but fall short of guaranteeing linearizable consistency in partially synchronous, lossy, partitionable, and dynamic networks, when data is distributed and replicated automatically by the principle of consistent hashing. This paper introduces consistent quorums as a solution for achieving atomic consistency. We present the design and implementation of CATS, a distributed key-value store which uses consistent quorums to guarantee linearizability and partition tolerance in such adverse and dynamic network conditions. CATS is scalable, elastic, and self-organizing; key properties for modern cloud storage middleware. Our system shows that consistency can be achieved with practical performance and modest throughput overhead (5%) for read-intensive workloads

    Task Migration for Fault-Tolerance in Mixed-Criticality Embedded Systems

    Get PDF
    In this paper we are interested in mixed-criticality embed-ded applications implemented on distributed architectures. Depending on their time-criticality, tasks can be hard or soft real-time and regarding safety-criticality, tasks can be fault-tolerant to transient faults, permanent faults, or have no dependability requirements. We use Earliest Deadline First (EDF) scheduling for the hard tasks and the Constant Bandwidth Server (CBS) for the soft tasks. The CBS pa-rameters determine the quality of service (QoS) of soft tasks. Transient faults are tolerated using checkpointing with roll-back recovery. For tolerating permanent faults in proces-sors, we use task migration, i.e., restarting the safety-critical tasks on other processors. We propose a Greedy-based on-line heuristic for the migration of safety-critical tasks, in response to permanent faults, and the adjustment of CBS parameters on the target processors, such that the faults are tolerated, the deadlines for the hard real-time tasks are sat-isfied and the QoS for soft tasks is maximized. The proposed online adaptive approach has been evaluated using several synthetic benchmarks and a real-life case study. 1

    Flexible multi-layer virtual machine design for virtual laboratory in distributed systems and grids.

    Get PDF
    We propose a flexible Multi-layer Virtual Machine (MVM) design intended to improve efficiencies in distributed and grid computing and to overcome the known current problems that exist within traditional virtual machine architectures and those used in distributed and grid systems. This thesis presents a novel approach to building a virtual laboratory to support e-science by adapting MVMs within the distributed systems and grids, thereby providing enhanced flexibility and reconfigurability by raising the level of abstraction. The MVM consists of three layers. They are OS-level VM, queue VMs, and components VMs. The group of MVMs provides the virtualized resources, virtualized networks, and reconfigurable components layer for virtual laboratories. We demonstrate how our reconfigurable virtual machine can allow software designers and developers to reuse parallel communication patterns. In our framework, the virtual machines can be created on-demand and their applications can be distributed at the source-code level, compiled and instantiated in runtime. (Abstract shortened by UMI.) Paper copy at Leddy Library: Theses & Major Papers - Basement, West Bldg. / Call Number: Thesis2005 .K56. Source: Masters Abstracts International, Volume: 44-03, page: 1405. Thesis (M.Sc.)--University of Windsor (Canada), 2005

    Kompics: a message-passing component model for building distributed systems

    Get PDF
    The Kompics component model and programming framework was designedto simplify the development of increasingly complex distributed systems. Systems built with Kompics leverage multi-core machines out of the box and they can be dynamically reconfigured to support hot software upgrades. A simulation framework enables deterministic debugging and reproducible performance evaluation of unmodified Kompics distributed systems. We describe the component model and show how to program and compose event-based distributed systems. We present the architectural patterns and abstractions that Kompics facilitates and we highlight a case study of a complex distributed middleware that we have built with Kompics. We show how our approach enables systematic development and evaluation of large-scale and dynamic distributed systems

    LEGaTO: first steps towards energy-efficient toolset for heterogeneous computing

    Get PDF
    LEGaTO is a three-year EU H2020 project which started in December 2017. The LEGaTO project will leverage task-based programming models to provide a software ecosystem for Made-in-Europe heterogeneous hardware composed of CPUs, GPUs, FPGAs and dataflow engines. The aim is to attain one order of magnitude energy savings from the edge to the converged cloud/HPC.Peer ReviewedPostprint (author's final draft

    Self-stabilization Overhead: an Experimental Case Study on Coded Atomic Storage

    Full text link
    Shared memory emulation can be used as a fault-tolerant and highly available distributed storage solution or as a low-level synchronization primitive. Attiya, Bar-Noy, and Dolev were the first to propose a single-writer, multi-reader linearizable register emulation where the register is replicated to all servers. Recently, Cadambe et al. proposed the Coded Atomic Storage (CAS) algorithm, which uses erasure coding for achieving data redundancy with much lower communication cost than previous algorithmic solutions. Although CAS can tolerate server crashes, it was not designed to recover from unexpected, transient faults, without the need of external (human) intervention. In this respect, Dolev, Petig, and Schiller have recently developed a self-stabilizing version of CAS, which we call CASSS. As one would expect, self-stabilization does not come as a free lunch; it introduces, mainly, communication overhead for detecting inconsistencies and stale information. So, one would wonder whether the overhead introduced by self-stabilization would nullify the gain of erasure coding. To answer this question, we have implemented and experimentally evaluated the CASSS algorithm on PlanetLab; a planetary scale distributed infrastructure. The evaluation shows that our implementation of CASSS scales very well in terms of the number of servers, the number of concurrent clients, as well as the size of the replicated object. More importantly, it shows (a) to have only a constant overhead compared to the traditional CAS algorithm (which we also implement) and (b) the recovery period (after the last occurrence of a transient fault) is as fast as a few client (read/write) operations. Our results suggest that CASSS does not significantly impact efficiency while dealing with automatic recovery from transient faults and bounded size of needed resources

    Accelerated artificial neural networks on FPGA for fault detection in automotive systems

    Get PDF
    Modern vehicles are complex distributed systems with critical real-time electronic controls that have progressively replaced their mechanical/hydraulic counterparts, for performance and cost benefits. The harsh and varying vehicular environment can induce multiple errors in the computational/communication path, with temporary or permanent effects, thus demanding the use of fault-tolerant schemes. Constraints in location, weight, and cost prevent the use of physical redundancy for critical systems in many cases, such as within an internal combustion engine. Alternatively, algorithmic techniques like artificial neural networks (ANNs) can be used to detect errors and apply corrective measures in computation. Though adaptability of ANNs presents advantages for fault-detection and fault-tolerance measures for critical sensors, implementation on automotive grade processors may not serve required hard deadlines and accuracy simultaneously. In this work, we present an ANN-based fault-tolerance system based on hybrid FPGAs and evaluate it using a diesel engine case study. We show that the hybrid platform outperforms an optimised software implementation on an automotive grade ARM Cortex M4 processor in terms of latency and power consumption, also providing better consolidation
    • …
    corecore