1,393 research outputs found

    Architecting fault-tolerant software systems

    Get PDF
    The increasing size and complexity of software systems makes it hard to prevent or remove all possible faults. Faults that remain in the system can eventually lead to a system failure. Fault tolerance techniques are introduced for enabling systems to recover and continue operation when they are subject to faults. Many fault tolerance techniques are available but incorporating them in a system is not always trivial. We consider the following problems in designing a fault-tolerant system. First, existing reliability analysis techniques generally do not prioritize potential failures from the end-user perspective and accordingly do not identify sensitivity points of a system. \ud Second, existing architecture styles are not well-suited for specifying, communicating and analyzing design decisions that are particularly related to the fault-tolerant aspects of a system. Third, there are no adequate analysis techniques that evaluate the impact of fault tolerance techniques on the functional decomposition of software architecture. Fourth, realizing a fault-tolerant design usually requires a substantial development and maintenance effort. \ud To tackle the first problem, we propose a scenario-based software architecture reliability analysis method, called SARAH that benefits from mature reliability engineering techniques (i.e. FMEA, FTA) to provide an early reliability analysis of the software architecture design. SARAH evaluates potential failures from the end-user perspective to identify sensitive points of a system without requiring an implementation. \ud As a new architectural style, we introduce Recovery Style for specifying fault-tolerant aspects of software architecture. Recovery Style is used for communicating and analyzing architectural design decisions and for supporting detailed design with respect to recovery. \ud As a solution for the third problem, we propose a systematic method for optimizing the decomposition of software architecture for local recovery, which is an effective fault tolerance technique to attain high system availability. To support the method, we have developed an integrated set of tools that employ optimization techniques, state-based analytical models (i.e. CTMCs) and dynamic analysis on the system. The method enables the following: i ) modeling the design space of the possible decomposition alternatives, ii ) reducing the design space with respect to domain and stakeholder constraints and iii ) making the desired trade-off between availability and performance metrics. \ud To reduce the development and maintenance effort, we propose a framework, FLORA that supports the decomposition and implementation of software architecture for local recovery. The framework provides reusable abstractions for defining recoverable units and for incorporating the necessary coordination and communication protocols for recovery

    Software dependability modeling using an industry-standard architecture description language

    Full text link
    Performing dependability evaluation along with other analyses at architectural level allows both making architectural tradeoffs and predicting the effects of architectural decisions on the dependability of an application. This paper gives guidelines for building architectural dependability models for software systems using the AADL (Architecture Analysis and Design Language). It presents reusable modeling patterns for fault-tolerant applications and shows how the presented patterns can be used in the context of a subsystem of a real-life application

    Academic Panel: Can Self-Managed Systems be trusted?

    Get PDF
    Trust can be defined as to have confidence or faith in; a form of reliance or certainty based on past experience; to allow without fear; believe; hope: expect and wish; and extend credit to. The issue of trust in computing has always been a hot topic, especially notable with the proliferation of services over the Internet, which has brought the issue of trust and security right into the ordinary home. Autonomic computing brings its own complexity to this. With systems that self-manage, the internal decision making process is less transparent and the ‘intelligence’ possibly evolving and becoming less tractable. Such systems may be used from anything from environment monitoring to looking after Granny in the home and thus the issue of trust is imperative. To this end, we have organised this panel to examine some of the key aspects of trust. The first section discusses the issues of self-management when applied across organizational boundaries. The second section explores predictability in self-managed systems. The third part examines how trust is manifest in electronic service communities. The final discussion demonstrates how trust can be integrated into an autonomic system as the core intelligence with which to base adaptivity choices upon

    Software engineering and middleware: a roadmap (Invited talk)

    Get PDF
    The construction of a large class of distributed systems can be simplified by leveraging middleware, which is layered between network operating systems and application components. Middleware resolves heterogeneity and facilitates communication and coordination of distributed components. Existing middleware products enable software engineers to build systems that are distributed across a local-area network. State-of-the-art middleware research aims to push this boundary towards Internet-scale distribution, adaptive and reconfigurable middleware and middleware for dependable and wireless systems. The challenge for software engineering research is to devise notations, techniques, methods and tools for distributed system construction that systematically build and exploit the capabilities that middleware deliver
    • 

    corecore