132,173 research outputs found

    Fast Self-Healing Gradients

    Get PDF
    We present CRF-Gradient, a self-healing gradient algorithm that provably reconfigures in O(diameter) time. Self-healing gradients are a frequently used building block for distributed self-healing systems, but previous algorithms either have a healing rate limited by the shortest link in the network or must rebuild invalid regions from scratch. We have verified CRF-Gradient in simulation and on a network of Mica2 motes. Our approach can also be generalized and applied to create other self-healing calculations, such as cumulative probability fields

    Policy-based autonomic control service

    Get PDF
    Recently, there has been a considerable interest in policy-based, goal-oriented service management and autonomic computing. Much work is still required to investigate designs and policy models and associate meta-reasoning systems for policy-based autonomic systems. In this paper we outline a proposed autonomic middleware control service used to orchestrate selfhealing of distributed applications. Policies are used to adjust the systems autonomy and define self-healing strategies to stabilize/correct a given system in the event of failures

    Use of Self-Healing Techniques for Highly-Available Distributed Monitoring

    Get PDF
    The paper addresses the self-healing aspects of the monitoring systems. Nowadays, when the complex distributed systems are concerned, the monitoring system should become "intelligent" - as the first step it can guide the user what should be monitored. The next level of the "intelligence" can be described by the term "self-healing". The goal is to provide the capability that a decision made automatically by the monitoring system should force the system under monitoring to behave more stable, reliable and predictable. In the paper a new monitoring system is presented: AgeMon is an agent based, distributed monitoring system with strictly defined roles which can be performed by the agents. In the paper we discuss self-healing in the context of monitoring. When the self-healing of the monitoring system is concerned, a good example is the case where it is possible to lose the monitoring data due to the storage problems. AgeMon can handle such problems and automatically elects substitute persistence agents to store the data

    Self-Healing Distributed Scheduling Platform

    Get PDF
    International audienceDistributed systems require effective mechanisms to manage the reliable provisioning of computational resources from different and distributed providers. Moreover, the dynamic environment that affects the behaviour of such systems and the complexity of these dynamics demand autonomous capabilities to ensure the behaviour of distributed scheduling platforms and to achieve business and user objectives. In this paper we propose a self-adaptive distributed scheduling platform composed of multiple agents implemented as intelligent feedback control loops to support policy-based scheduling and expose self-healing capabilities. Our platform leverages distributed scheduling processes by (i) allowing each provider to maintain its own internal scheduling process, and (ii) implementing self-healing capabilities based on agent module recovery. Simulated tests are performed to determine the optimal number of agents to be used in the negotiation phase without affecting the scheduling cost function. Test results on a real-life platform are presented to evaluate recovery times and optimize platform parameters

    Prototype of Fault Adaptive Embedded Software for Large-Scale Real-Time Systems

    Get PDF
    This paper describes a comprehensive prototype of large-scale fault adaptive embedded software developed for the proposed Fermilab BTeV high energy physics experiment. Lightweight self-optimizing agents embedded within Level 1 of the prototype are responsible for proactive and reactive monitoring and mitigation based on specified layers of competence. The agents are self-protecting, detecting cascading failures using a distributed approach. Adaptive, reconfigurable, and mobile objects for reliablility are designed to be self-configuring to adapt automatically to dynamically changing environments. These objects provide a self-healing layer with the ability to discover, diagnose, and react to discontinuities in real-time processing. A generic modeling environment was developed to facilitate design and implementation of hardware resource specifications, application data flow, and failure mitigation strategies. Level 1 of the planned BTeV trigger system alone will consist of 2500 DSPs, so the number of components and intractable fault scenarios involved make it impossible to design an `expert system' that applies traditional centralized mitigative strategies based on rules capturing every possible system state. Instead, a distributed reactive approach is implemented using the tools and methodologies developed by the Real-Time Embedded Systems group.Comment: 2nd Workshop on Engineering of Autonomic Systems (EASe), in the 12th Annual IEEE International Conference and Workshop on the Engineering of Computer Based Systems (ECBS), Washington, DC, April, 200

    Towards Automotive Embedded Systems with Self-X Properties

    Get PDF
    With self-adaptation and self-organization new paradigms for the management of distributed systems have been introduced. By enhancing the automotive software system with self-X capabilities, e.g. self-healing, self-configuration and self-optimization, the complexity is handled while increasing the flexibility, scalability and dependability of these systems. In this chapter we present an approach for enhancing automotive systems with self-X properties. At first, we discuss the benefits of providing automotive software systems with self-management capabilities and outline concrete use cases. Afterwards, we will discuss requirements and challenges for realizing adaptive automotive embedded systems

    Self-Healing Systems: Foundations and Challenges

    Get PDF
    The term and characteristic of self-healing, applied to systems, is often seen from different fields of computer science, such as fault tolerance or network and service management, with diverging semantics. Since this impression was confirmed also during the first discussions of the Dagstuhl seminar on "Self-Healing and Self-Adaptive Systems", a seminar\u27s working group on "Terminology" was formed with the objective to address the question of finding commonalities and differences in a self-healing characteristic of stand-alone and distributed systems. The outcomes of the discussion in terms of foundations, the description of the self-healing process and the identification of the main challenges of such self-healing systems are presented
    • …
    corecore