7,317 research outputs found

    Lifeguard: Local Health Awareness for More Accurate Failure Detection

    Full text link
    SWIM is a peer-to-peer group membership protocol with attractive scaling and robustness properties. However, slow message processing can cause SWIM to mark healthy members as failed (so called false positive failure detection), despite inclusion of a mechanism to avoid this. We identify the properties of SWIM that lead to the problem, and propose Lifeguard, a set of extensions to SWIM which consider that the local failure detector module may be at fault, via the concept of local health. We evaluate this approach in a precisely controlled environment and validate it in a real-world scenario, showing that it drastically reduces the rate of false positives. The false positive rate and detection time for true failures can be reduced simultaneously, compared to the baseline levels of SWIM

    Failure detection and repair of threads in CTAS

    Get PDF
    Thesis (M. Eng. and S.B.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2004.Includes bibliographical references (p. 73).Reliable, error-free software is hard to come by, and this is especially true for newer, larger, or more complex programs. CTAS, an air traffic control tool, falls into this category, making it a good candidate for research on error compensation. Specifically, this thesis addresses the issue of thread crashes in one portion of CTAS. We reimplement the thread structure in question around a simpler problem, and develop a failure detector and an accompanying repair mechanism to monitor it. These add-on components provide the application with thread consistency by swiftly and transparently recovering from crashes, thereby yielding a more stable, self-sufficient, and generally more reliable operating environment.by Farid Jahanmir.M.Eng.and S.B

    Index to 1984 NASA Tech Briefs, volume 9, numbers 1-4

    Get PDF
    Short announcements of new technology derived from the R&D activities of NASA are presented. These briefs emphasize information considered likely to be transferrable across industrial, regional, or disciplinary lines and are issued to encourage commercial application. This index for 1984 Tech B Briefs contains abstracts and four indexes: subject, personal author, originating center, and Tech Brief Number. The following areas are covered: electronic components and circuits, electronic systems, physical sciences, materials, life sciences, mechanics, machinery, fabrication technology, and mathematics and information sciences

    CaloGAN: Simulating 3D High Energy Particle Showers in Multi-Layer Electromagnetic Calorimeters with Generative Adversarial Networks

    Full text link
    The precise modeling of subatomic particle interactions and propagation through matter is paramount for the advancement of nuclear and particle physics searches and precision measurements. The most computationally expensive step in the simulation pipeline of a typical experiment at the Large Hadron Collider (LHC) is the detailed modeling of the full complexity of physics processes that govern the motion and evolution of particle showers inside calorimeters. We introduce \textsc{CaloGAN}, a new fast simulation technique based on generative adversarial networks (GANs). We apply these neural networks to the modeling of electromagnetic showers in a longitudinally segmented calorimeter, and achieve speedup factors comparable to or better than existing full simulation techniques on CPU (100×100\times-1000×1000\times) and even faster on GPU (up to ∼105×\sim10^5\times). There are still challenges for achieving precision across the entire phase space, but our solution can reproduce a variety of geometric shower shape properties of photons, positrons and charged pions. This represents a significant stepping stone toward a full neural network-based detector simulation that could save significant computing time and enable many analyses now and in the future.Comment: 14 pages, 4 tables, 13 figures; version accepted by Physical Review D (PRD

    Worldwide consensus

    Get PDF
    Consensus is an abstraction of a variety of important challenges in dependable distributed systems. Thus a large body of theoretical knowledge is focused on modeling and solving consensus within different system assumptions. However, moving from theory to practice imposes compromises and design decisions that may impact the elegance, trade-offs and correctness of theoretical appealing consensus protocols. In this paper we present the implementation and detailed analysis, in a real environment with a large number of nodes, of mutable consensus, a theoretical appealing protocol able to offer a wide range of trade-offs (called mutations) between decision latency and message complexity. The analysis sheds light on the fundamental behavior of the mutations, and leads to the identification of problems related to the real environment. Such problems are addressed without ever affecting the correctness of the theoretical proposal

    Implementing the weakest failure detector for solving consensus

    Get PDF
    The concept of unreliable failure detector was introduced by Chandra and Toueg as a mechanism that provides information about process failures. This mechanism has been used to solve several agreement problems, such as the consensus problem. In this paper, algorithms that implement failure detectors in partially synchronous systems are presented. First two simple algorithms of the weakest class to solve the consensus problem, namely the Eventually Strong class (⋄S), are presented. While the first algorithm is wait-free, the second algorithm is f-resilient, where f is a known upper bound on the number of faulty processes. Both algorithms guarantee that, eventually, all the correct processes agree permanently on a common correct process, i.e. they also implement a failure detector of the class Omega (Ω). They are also shown to be optimal in terms of the number of communication links used forever. Additionally, a wait-free algorithm that implements a failure detector of the Eventually Perfect class (⋄P) is presented. This algorithm is shown to be optimal in terms of the number of bidirectional links used forever

    Self-management Framework for Mobile Autonomous Systems

    Get PDF
    The advent of mobile and ubiquitous systems has enabled the development of autonomous systems such as wireless-sensors for environmental data collection and teams of collaborating Unmanned Autonomous Vehicles (UAVs) used in missions unsuitable for humans. However, with these range of new application domains comes a new challenge – enabling self-management in mobile autonomous systems. The primary challenge in using autonomous systems for real-life missions is shifting the burden of management from humans to these systems themselves without loss of the ability to adapt to failures, changes in context, and changing user requirements. Autonomous systems have to be able to manage themselves individually as well as to form self-managing teams that are able to recover or adapt to failures, protect themselves from attacks and optimise performance. This thesis proposes a novel distributed policy-based framework that enables autonomous systems to perform self management individually and as a team. The framework allows missions to be specified in terms of roles in an adaptable and reusable way, enables dynamic and secure team formation with a utility-based approach for optimal role assignment, caters for communication link maintenance among team members and recovery from failure. Adaptive management is achieved by employing an architecture that uses policy-based techniques to allow dynamic modification of the management strategy relating to resources, role behaviour, team and communications management, without reloading the basic software within the system. Evaluation of the framework shows that it is scalable with respect to the number of roles, and consequently the number of autonomous systems participating in the mission. It is also shown to be optimal with respect to role assignments, and robust to intermittent communication link disconnections and permanent team-member failures. The prototype implementation was tested on mobile robots as a proof-ofconcept demonstration
    • …
    corecore