Search CORE

40 research outputs found

Recommended from our members

Adding Self-healing capabilities to the Common Language Runtime

Author: Griffith Rean
Kaiser Gail E.
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2005
Field of study

Self-healing systems require that repair mechanisms are available to resolve problems that arise while the system executes. Managed execution environments such as the Common Language Runtime (CLR) and Java Virtual Machine (JVM) provide a number of application services (application isolation, security sandboxing, garbage collection and structured exception handling) which are geared primarily at making managed applications more robust. However, none of these services directly enables applications to perform repairs or consistency checks of their components. From a design and implementation standpoint, the preferred way to enable repair in a self-healing system is to use an externalized repair/adaptation architecture rather than hardwiring adaptation logic inside the system where it is harder to analyze, reuse and extend. We present a framework that allows a repair engine to dynamically attach and detach to/from a managed application while it executes essentially adding repair mechanisms as another application service provided in the execution environment

Columbia University Academic Commons

RAS-Models: A Building Block for Self-Healing Benchmarks

Author: Griffith Rean
Kaiser Gail E.
Virmani Ritika
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2007
Field of study

To evaluate the efficacy of self-healing systems a rigorous, objective, quantitative benchmarking methodology is needed. However, developing such a benchmark is a non-trivial task given the many evaluation issues to be resolved, including but not limited to: quantifying the impacts of faults, analyzing various styles of healing (reactive, preventative, proactive), accounting for partially automated healing and accounting for incomplete/imperfect healing. We posit, however,that it is possible to realize a self-healing benchmark using a collection of analytical techniques and practical tools as building blocks. This paper highlights the flexibility of one analytical tool, the Reliability, Availability and Serviceability (RAS) model, and illustrates its power and relevance to the problem of evaluating self-healing mechanisms/systems, when combined with practical tools for fault-injection

CiteSeerX

Columbia University Academic Commons

The Role of Reliability, Availability and Serviceability (RAS) Models in the Design and Evaluation of Self-Healing Systems

Author: Griffith Rean
Kaiser Gail E.
Virmani Ritika
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2007
Field of study

In an idealized scenario, self-healing systems predict, prevent or diagnose problems and take the appropriate actions to mitigate their impact with minimal human intervention. To determine how close we are to reaching this goal we require analytical techniques and practical approaches that allow us to quantify the effectiveness of a system's remediations mechanisms. In this paper we apply analytical techniques based on Reliability, Availability and Serviceability (RAS) models to evaluate individual remediation mechanisms of select system components and their combined effects on the system. We demonstrate the applicability of RAS-models to the evaluation of self-healing systems by using them to analyze various styles of remediations (reactive, preventative etc.), quantify the impact of imperfect remediations, identify sub-optimal (less effective) remediations and quantify the combined effects of all the activated remediations on the system as a whole

CiteSeerX

Columbia University Academic Commons

Recommended from our members

Effecting Runtime Reconfiguration in Managed Execution Environments

Author: Griffith Rean
Kaiser Gail E.
Valetto Giuseppe
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2005
Field of study

Managed execution environments such as MicrosoftÃ¤Ã³Â»s Common Language Runtime (CLR) and Sun MicrosystemsÃ¤Ã³Â» Java Virtual Machine (JVM) provide a number of services Ã¤Ã³Ã± including but not limited to application isolation, security sandboxing, garbage collection and structured exception handling Ã¤Ã³Ã± that are aimed primarily at enhancing the robustness of managed applications. However, none of these services directly enables performing reconfigurations, repairs or diagnostics on the managed applications and/or its constituent subsystems and components. In this paper we examine how the facilities of a managed execution environment can be leveraged to support runtime system adaptations, such as reconfigurations and repairs. We describe an adaptation framework we have developed, which uses these facilities to dynamically attach/detach an engine capable of performing reconfigurations and repairs on a target system while it executes. Our adaptation framework is lightweight, and transparent to the application and the managed execution environment: it does not require recompilation of the application nor specially compiled versions of the managed execution runtime. Our prototype was implemented for the CLR. To evaluate our framework beyond toy examples, we searched on SourceForge for potential target systems already implemented on the CLR that might benefit from runtime adaptation. We report on our experience using our prototype to effect runtime reconfigurations in a system that was developed and is in use by others: the Alchemi enterprise Grid Computing System developed at the University of Melbourne, Australia

Columbia University Academic Commons

Multi-perspective Evaluation of Self-Healing Systems Using Simple Probabilistic Models

Author: Griffith Rean
Kaiser Gail E.
LoÌpez Javier Alonso
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2009
Field of study

Quantifying the efficacy of self-healing systems is a challenging but important task, which has implications for increasing designer, operator and end-user confidence in these systems. During design system architects benefit from tools and techniques that enhance their understanding of the system, allowing them to reason about the tradeoffs of proposed or existing self-healing mechanisms and the overall effectiveness of the system as a result of different mechanism-compositions. At deployment time, system integrators and operators need to understand how the selfhealing mechanisms work and how their operation impacts the system's reliability, availability and serviceability (RAS) in order to cope with any limitations of these mechanisms when the system is placed into production. In this paper we construct an evaluation framework for selfhealing systems around simple, yet powerful, probabilistic models that capture the behavior of the system's selfhealing mechanisms from multiple perspectives (designer, operator, and end-user). We combine these analytical models with runtime fault-injection to study the operation of VM-Rejuv — a virtual machine based rejuvenation scheme for web-application servers. We use the results from the fault-injection experiments and model-analysis to reason about the efficacy of VM-Rejuv, its limitations and strategies for managing/mitigating these limitations in system deployments. Whereas we use VM-Rejuv as the subject of our evaluation in this paper, our main contribution is a practical evaluation approach that can be generalized to other self-healing systems

Crossref

Columbia University Academic Commons

Recommended from our members

Multi-perspective Evaluation of Self-Healing Systems Using Simple Probabilistic Models

Author: Griffith Rean
Kaiser Gail E.
LoÌpez Javier Alonso
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2009
Field of study

Columbia University Academic Commons

Recommended from our members

Dynamic Adaptation of Rules for Temporal Event Correlation in Distributed Systems

Author: Diao Yixin
Griffith Rean
Hellerstein Joseph L.
Kaiser Gail E.
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2005
Field of study

Event correlation is essential to realizing self-managing distributed systems. For example, distributed systems often require that events be correlated from multiple systems using temporal patterns to detect denial of service attacks and to warn of problems with business critical applications that run on multiple servers. This paper addresses how to specify timer values for temporal patterns so as to manage the trade-off between false alarms and undetected alarms. A central concern is addressing the variability of event propagation delays due to factors such as contention for network and server resources. To this end, we develop an architecture and an adaptive control algorithm that dynamically compensate for variations in propagation delays. Our approach makes Management Stations more autonomic by avoiding the need for manual adjustments of timer values in temporal rules. Further, studies we conducted of a testbed system suggest that our approach produces results that are at least as good as an optimal fixed setting of timer values

Columbia University Academic Commons

Recommended from our members

Dynamic Adaptation of Temporal Event Correlation Rules

Author: Diao Yixin
Griffith Rean
Hellerstein Joseph
Kaiser Gail E.
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2005
Field of study

Temporal event correlation is essential to realizing self-managing distributed systems. Autonomic controllers often require that events be correlated across multiple components using rule patterns with timer-based transitions, e.g., to detect denial of service attacks and to warn of staging problems with business critical applications. This short paper discusses automatic adjustment of timer values for event correlation rules, in particular compensating for the variability of event propagation delays due to factors such as contention for network and server resources. We describe a corresponding Management Station architecture and present experimental studies on a testbed system that suggest that this approach can produce results at least as good as an optimal fixed setting of timer values

Columbia University Academic Commons

Recommended from our members

Deriving Utility from a Self-Healing Benchmark Report

Author: Griffith Rean
Kaiser Gail E.
Virmani Ritika
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2006
Field of study

Autonomic systems, specifically self-healing systems, currently lack an objective and relevant methodology for their evaluation. Due to their focus on problem detection, diagnosis and remediation any evaluation methodology should facilitate an objective evaluation and/or comparison of these activities. Measures of "raw" performance are easily quantified and hence facilitate measurement and comparison on the basis of numbers. However, classifying a system better at problem detection, diagnosis and remediation purely on the basis of performance measures is not useful. The proposed evaluation methodology devised will differ from traditional benchmarks, which are primarily concerned with measures of performance. In order to develop this methodology we rely on a set of experiments which will enable us to compare the self-healing capabilities of one system versus another. As currently we do not have available "real" self-healing systems, we will simulate the behavior of some target self-healing systems, system faults and the operational and repair activities of target systems. Further, we will use the results derived from the simulation experiments to answer questions relevant to the utility of a benchmark report

Columbia University Academic Commons