23 research outputs found
Lifeguard: Local Health Awareness for More Accurate Failure Detection
SWIM is a peer-to-peer group membership protocol with attractive scaling and
robustness properties. However, slow message processing can cause SWIM to mark
healthy members as failed (so called false positive failure detection), despite
inclusion of a mechanism to avoid this.
We identify the properties of SWIM that lead to the problem, and propose
Lifeguard, a set of extensions to SWIM which consider that the local failure
detector module may be at fault, via the concept of local health. We evaluate
this approach in a precisely controlled environment and validate it in a
real-world scenario, showing that it drastically reduces the rate of false
positives. The false positive rate and detection time for true failures can be
reduced simultaneously, compared to the baseline levels of SWIM
An Analysis of Distributed Systems Syllabi With a Focus on Performance-Related Topics
We analyze a dataset of 51 current (2019-2020) Distributed Systems syllabi
from top Computer Science programs, focusing on finding the prevalence and
context in which topics related to performance are being taught in these
courses. We also study the scale of the infrastructure mentioned in DS courses,
from small client-server systems to cloud-scale, peer-to-peer, global-scale
systems. We make eight main findings, covering goals such as performance, and
scalability and its variant elasticity; activities such as performance
benchmarking and monitoring; eight selected performance-enhancing techniques
(replication, caching, sharding, load balancing, scheduling, streaming,
migrating, and offloading); and control issues such as trade-offs that include
performance and performance variability.Comment: Accepted for publication at WEPPE 2021, to be held in conjunction
with ACM/SPEC ICPE 2021: https://doi.org/10.1145/3447545.3451197 This article
is a follow-up of our prior ACM SIGCSE publication, arXiv:2012.0055
RoBuSt: A Crash-Failure-Resistant Distributed Storage System
In this work we present the first distributed storage system that is provably
robust against crash failures issued by an adaptive adversary, i.e., for each
batch of requests the adversary can decide based on the entire system state
which servers will be unavailable for that batch of requests. Despite up to
crashed servers, with constant and
denoting the number of servers, our system can correctly process any batch of
lookup and write requests (with at most a polylogarithmic number of requests
issued at each non-crashed server) in at most a polylogarithmic number of
communication rounds, with at most polylogarithmic time and work at each server
and only a logarithmic storage overhead.
Our system is based on previous work by Eikel and Scheideler (SPAA 2013), who
presented IRIS, a distributed information system that is provably robust
against the same kind of crash failures. However, IRIS is only able to serve
lookup requests. Handling both lookup and write requests has turned out to
require major changes in the design of IRIS.Comment: Revised full versio
Identifying Node Falls In Mobile Wireless Networks: A Probabilistic Method
We adopt a probabilistic strategy and propose two node failure discovery plots that deliberately join restricted monitoring, area estimation and node joint effort. Broad reproduction brings about both associated and detached network s exhibit that our plans accomplish high failure identification rates (near an upper bound) and low false positive rates, and bring about low correspondence overhead. Contrasted with approaches that utilization incorporated checking, our approach has up to 80% lower correspondence overhead, and just marginally bring down location rates and somewhat higher false positive rates. Moreover, our approach has the favorable position that it is relevant to both associated and disengaged network s while brought together checking is just material to associated networks
The eventual leadership in dynamic mobile networking environments
2007-2008 > Academic research: refereed > Refereed conference paperVersion of RecordPublishe
Node Failure Detection And Fault Management In Mobile Wireless Networks With Persistent Connectivity
We adopt a probabilistic strategy and propose two hub disappointment recognition plots that deliberately consolidate limited observing, area estimation and hub cooperation. Broad reproduction brings about both associated and disengaged systems show that our plans accomplish high disappointment identification rates (near an upper bound) and low false positive rates, and cause low correspondence overhead. Contrasted with methodologies that utilization concentrated checking, our approach has up to 80% lower correspondence overhead, and just somewhat bring down recognition rates and marginally higher false positive rates. Also, our approach has the favorable position that it is relevant to both associated and detached systems while brought together observing is just material to associated systems
Failure Detectors for Wireless Sensor-Actuator Systems
Wireless sensor-actuator systems (WSAS) offer exciting opportunities for emerging applications by facilitating fine-grained monitoring and control, and dense instrumentation. The large scale of such systems increases the need for such systems to tolerate and cope with failures, in a localized and decentralized manner. We present abstractions for detecting node failures and link failures caused by topology changes in a WSAS. These abstractions were designed and implemented as a set of reusable components in nesC under TinyOS. Results, which demonstrate the performance and viability of the abstractions, based on experiments on an 80 node testbed are presented. In the future, these abstractions can be extended to detect and cope with larger classes of failures in WSAS
Automated Application-level Checkpointing of MPI Programs
Because of increasing hardware and software complexity, the running
time of many computational science applications is now more than the mean-time-to-failure of high-performance computing platforms. Therefore, computational science applications need to tolerate hardware failures. In this paper, we focus on the stopping failure model in which a faulty process hangs and stops responding to the rest of the system. We argue that tolerating such faults is best done by an approach called application-level coordinated non-blocking checkpointing, and that existing fault-tolerance protocols in teh literature are not suitable for implementing this approach. In this paper, we present a suitable protocol, and show how it can be used with a precompiler that instruments C/MPI programs to save application and MPI library state. An advantage of our approach is that it is independent of the MPI implementation. We present experimental results that argue that the overhead of using our system can be small
Right On Time Distributed Shared Memory
The demand for real-time data storage in distributed control systems (DCSs) is growing. Yet, providing real- time DCS guarantees is challenging, especially when more and more sensor and actuator devices are connected to industrial plants and message loss needs to be taken into account. In this paper, we investigate how to build a shared memory abstraction for DCSs as a first step towards implementing different shared storage systems in a DCS context. We first prove that, in the presence of host crashes and message losses, the necessary guarantees of such an abstraction are impossible to implement using a traditional approach that has no access to the internals of existing DCS services, e.g., a modular approach where algorithms are built on top of existing software blocks like failure detectors. We propose a white-box approach that utilizes messages of existing services in any DCS as the sole means of communication. More precisely, we present TapeWorm, an algorithm that attaches itself to the heartbeat messages of the failure detector component in DCSs. We prove that TapeWorm implements the desired shared memory guarantees for applications running on a DCS. We also analyze the performance of TapeWorm and we showcase ways of adapting TapeWorm to various application needs and workloads
Toward Reliable and Efficient Message Passing Software for HPC Systems: Fault Tolerance and Vector Extension
As the scale of High-performance Computing (HPC) systems continues to grow, researchers are devoted themselves to achieve the best performance of running long computing jobs on these systems. My research focus on reliability and efficiency study for HPC software.
First, as systems become larger, mean-time-to-failure (MTTF) of these HPC systems is negatively impacted and tends to decrease. Handling system failures becomes a prime challenge. My research aims to present a general design and implementation of an efficient runtime-level failure detection and propagation strategy targeting large-scale, dynamic systems that is able to detect both node and process failures. Using multiple overlapping topologies to optimize the detection and propagation, minimizing the incurred overhead sand guaranteeing the scalability of the entire framework. Results from different machines and benchmarks compared to related works shows that my design and implementation outperforms non-HPC solutions significantly, and is competitive with specialized HPC solutions that can manage only MPI applications.
Second, I endeavor to implore instruction level parallelization to achieve optimal performance. Novel processors support long vector extensions, which enables researchers to exploit the potential peak performance of target architectures. Intel introduced Advanced Vector Extension (AVX512 and AVX2) instructions for x86 Instruction Set Architecture (ISA). Arm introduced Scalable Vector Extension (SVE) with a new set of A64 instructions. Both enable greater parallelisms. My research utilizes long vector reduction instructions to improve the performance of MPI reduction operations. Also, I use gather and scatter feature to speed up the packing and unpacking operation in MPI. The evaluation of the resulting software stack under different scenarios demonstrates that the approach is not only efficient but also generalizable to many vector architecture and efficient