132,173 research outputs found
Fast Self-Healing Gradients
We present CRF-Gradient, a self-healing gradient algorithm that provably reconfigures in O(diameter) time. Self-healing gradients are a frequently used building block for distributed self-healing systems, but previous algorithms either have a healing rate limited by the shortest link in the network or must rebuild invalid regions from scratch. We have verified CRF-Gradient in simulation and on a network of Mica2 motes. Our approach can also be generalized and applied to create other self-healing calculations, such as cumulative probability fields
Policy-based autonomic control service
Recently, there has been a considerable interest in policy-based, goal-oriented service management and autonomic computing. Much work is still required to investigate designs and policy models and associate meta-reasoning systems for policy-based autonomic systems. In this paper we outline a proposed autonomic middleware control service used to orchestrate selfhealing of distributed applications. Policies are used to adjust the systems autonomy and define self-healing strategies to stabilize/correct a given system in the event of failures
Use of Self-Healing Techniques for Highly-Available Distributed Monitoring
The paper addresses the self-healing aspects of the monitoring systems. Nowadays, when the complex distributed systems are concerned, the monitoring system should become "intelligent" - as the first step it can guide the user what should be monitored. The next level of the "intelligence" can be described by the term "self-healing". The goal is to provide the capability that a decision made automatically by the monitoring system should force the system under monitoring to behave more stable, reliable and predictable. In the paper a new monitoring system is presented: AgeMon is an agent based, distributed monitoring system with strictly defined roles which can be performed by the agents. In the paper we discuss self-healing in the context of monitoring. When the self-healing of the monitoring system is concerned, a good example is the case where it is possible to lose the monitoring data due to the storage problems. AgeMon can handle such problems and automatically elects substitute persistence agents to store the data
Recommended from our members
Engineering emergence for cluster configuration
Distributed applications are being deployed on ever-increasing scale and with ever-increasing functionality. Due to the accompanying increase in behavioural complexity, self-management abilities, such as self-healing, have become core requirements. A key challenge is the smooth embedding of such functionality into our systems.
Natural distributed systems such as ant colonies have evolved highly efficient behaviour. These emergent systems achieve high scalability through the use of low complexity communication strategies and are highly robust through large-scale replication of simple, anonymous entities. Ways to engineer this fundamentally non-deterministic behaviour for use in distributed applications are being explored.
An emergent, dynamic, cluster management scheme, which forms part of a hierarchical resource management architecture, is presented. Natural biological systems, which embed self-healing behaviour at several levels, have influenced the architecture. The resulting system is a simple, lightweight and highly robust platform on which cluster-based autonomic applications can be deployed
Self-Healing Distributed Scheduling Platform
International audienceDistributed systems require effective mechanisms to manage the reliable provisioning of computational resources from different and distributed providers. Moreover, the dynamic environment that affects the behaviour of such systems and the complexity of these dynamics demand autonomous capabilities to ensure the behaviour of distributed scheduling platforms and to achieve business and user objectives. In this paper we propose a self-adaptive distributed scheduling platform composed of multiple agents implemented as intelligent feedback control loops to support policy-based scheduling and expose self-healing capabilities. Our platform leverages distributed scheduling processes by (i) allowing each provider to maintain its own internal scheduling process, and (ii) implementing self-healing capabilities based on agent module recovery. Simulated tests are performed to determine the optimal number of agents to be used in the negotiation phase without affecting the scheduling cost function. Test results on a real-life platform are presented to evaluate recovery times and optimize platform parameters
Prototype of Fault Adaptive Embedded Software for Large-Scale Real-Time Systems
This paper describes a comprehensive prototype of large-scale fault adaptive
embedded software developed for the proposed Fermilab BTeV high energy physics
experiment. Lightweight self-optimizing agents embedded within Level 1 of the
prototype are responsible for proactive and reactive monitoring and mitigation
based on specified layers of competence. The agents are self-protecting,
detecting cascading failures using a distributed approach. Adaptive,
reconfigurable, and mobile objects for reliablility are designed to be
self-configuring to adapt automatically to dynamically changing environments.
These objects provide a self-healing layer with the ability to discover,
diagnose, and react to discontinuities in real-time processing. A generic
modeling environment was developed to facilitate design and implementation of
hardware resource specifications, application data flow, and failure mitigation
strategies. Level 1 of the planned BTeV trigger system alone will consist of
2500 DSPs, so the number of components and intractable fault scenarios involved
make it impossible to design an `expert system' that applies traditional
centralized mitigative strategies based on rules capturing every possible
system state. Instead, a distributed reactive approach is implemented using the
tools and methodologies developed by the Real-Time Embedded Systems group.Comment: 2nd Workshop on Engineering of Autonomic Systems (EASe), in the 12th
Annual IEEE International Conference and Workshop on the Engineering of
Computer Based Systems (ECBS), Washington, DC, April, 200
Towards Automotive Embedded Systems with Self-X Properties
With self-adaptation and self-organization new paradigms for the management of distributed systems have been introduced. By enhancing the automotive software system with self-X capabilities, e.g. self-healing, self-configuration and self-optimization, the complexity is handled while increasing the flexibility, scalability and dependability of these systems. In this chapter we present an approach for enhancing automotive systems with self-X properties. At first, we discuss the benefits of providing automotive software systems with self-management capabilities and outline concrete use cases. Afterwards, we will discuss requirements and challenges for realizing adaptive automotive embedded systems
Recommended from our members
The role of smart sensor networks for voltage monitoring in smart grids
The large-scale deployment of the Smart Grid paradigm will support the evolution of conventional electrical power systems toward active, flexible and self-healing web energy networks composed of distributed and cooperative energy resources. In a Smart Grid platform, distributed voltage monitoring is one of the main issues to address. In this field, the application of traditional hierarchical monitoring paradigms has some disadvantages that could hinder their application in Smart Grids where the constant growth of grid complexity and the need for massive pervasion of Distribution Generation Systems (DGS) require more scalable, more flexible control and regulation paradigms. To try to overcome these challenges, this paper proposes the concept of a decentralized non-hierarchal voltage monitoring architecture based on intelligent and cooperative smart entities. These devices employ traditional sensors to acquire local bus variables and mutually coupled oscillators to assess the main variables describing the global grid state
Self-Healing Systems: Foundations and Challenges
The term and characteristic of self-healing, applied to systems, is often seen from different fields of computer science, such as fault tolerance or network and service management, with diverging semantics. Since this impression was confirmed also during the first discussions of the Dagstuhl seminar on "Self-Healing and Self-Adaptive Systems", a seminar\u27s working group on "Terminology" was formed with the objective to address the question of finding commonalities and differences in a self-healing characteristic of stand-alone and distributed systems. The outcomes of the discussion in terms of foundations, the description of the self-healing process and the identification of the main challenges of such self-healing systems are presented
- …