492 research outputs found

    Preliminary Specification of Services and Protocols

    Get PDF
    This document describes the preliminary specification of services and protocols for the Crutial Architecture. The Crutial Architecture definition, first addressed in Crutial Project Technical Report D4 (January 2007), intends to reply to a grand challenge of computer science and control engineering: how to achieve resilience of critical information infrastructures, in particular in the electrical sector. The definitions herein elaborate on the major architectural options and components established in the Preliminary Architecture Specification (D4), with special relevance to the Crutial middleware building blocks, and are based on the fault, synchrony and topological models defined in the same document. The document, in general lines, describes the Runtime Support Services and APIs, and the Middleware Services and APIs. Then, it delves into the protocols, describing: Runtime Support Protocols, and Middleware Services Protocols. The Runtime Support Services and APIs chapter features as a main component, the Proactive-Reactive Recovery Service, whose aim is to guarantee perpetual execution of any components it protects. The Middleware Services and APIs chapter describes our approach to intrusion-tolerant middleware. The middleware comprises several layers. The Multipoint Network layer is the lowest layer of CRUTIAL's middleware, and features an abstraction of basic communication services, such as provided by standard protocols, like IP, IPsec, UDP, TCP and SSL/TLS. The Communication Support Services feature two important building blocks: the Randomized Intrusion-Tolerant Services (RITAS), and the Overlay Protection Layer (OPL) against DoS attacks. The Activity Support Services currently defined comprise the CIS Protection service, and the Access Control and Authorization service. Protection as described in this report is implemented by mechanisms and protocols residing on a device called Crutial Information Switch (CIS). The Access Control and Authorization service is implemented through PolyOrBAC, which defines the rules for information exchange and collaboration between sub-modules of the architecture, corresponding in fact to different facilities of the CII's organizations.The Monitoring and Failure Detection layer contains a preliminary definition of the middleware services devoted to monitoring and failure detection activities. The remaining chapters describe the protocols implementing the above-mentioned services: Runtime Support Protocols, and Middleware Services Protocol

    Proactive resilience

    Get PDF
    Tese de doutoramento em Informática (Ciências da Computação), apresentada à Universidade de Lisboa através da Faculdade de Ciências, 2007Disponível no document

    Fault Injection Analytics: A Novel Approach to Discover Failure Modes in Cloud-Computing Systems

    Full text link
    Cloud computing systems fail in complex and unexpected ways due to unexpected combinations of events and interactions between hardware and software components. Fault injection is an effective means to bring out these failures in a controlled environment. However, fault injection experiments produce massive amounts of data, and manually analyzing these data is inefficient and error-prone, as the analyst can miss severe failure modes that are yet unknown. This paper introduces a new paradigm (fault injection analytics) that applies unsupervised machine learning on execution traces of the injected system, to ease the discovery and interpretation of failure modes. We evaluated the proposed approach in the context of fault injection experiments on the OpenStack cloud computing platform, where we show that the approach can accurately identify failure modes with a low computational cost.Comment: IEEE Transactions on Dependable and Secure Computing; 16 pages. arXiv admin note: text overlap with arXiv:1908.1164

    Architecture, Services and Protocols for CRUTIAL

    Get PDF
    This document describes the complete specification of the architecture, services and protocols of the project CRUTIAL. The CRUTIAL Architecture intends to reply to a grand challenge of computer science and control engineering: how to achieve resilience of critical information infrastructures (CII), in particular in the electrical sector. In general lines, the document starts by presenting the main architectural options and components of the architecture, with a special emphasis on a protection device called the CRUTIAL Information Switch (CIS). Given the various criticality levels of the equipments that have to be protected, and the cost of using a replicated device, we define a hierarchy of CIS designs incrementally more resilient. The different CIS designs offer various trade offs in terms of capabilities to prevent and tolerate intrusions, both in the device itself and in the information infrastructure. The Middleware Services, APIs and Protocols chapter describes our approach to intrusion tolerant middleware. The CRUTIAL middleware comprises several building blocks that are organized on a set of layers. The Multipoint Network layer is the lowest layer of the middleware, and features an abstraction of basic communication services, such as provided by standard protocols, like IP, IPsec, UDP, TCP and SSL/TLS. The Communication Support layer features three important building blocks: the Randomized Intrusion-Tolerant Services (RITAS), the CIS Communication service and the Fosel service for mitigating DoS attacks. The Activity Support layer comprises the CIS Protection service, and the Access Control and Authorization service. The Access Control and Authorization service is implemented through PolyOrBAC, which defines the rules for information exchange and collaboration between sub-modules of the architecture, corresponding in fact to different facilities of the CII’s organizations. The Monitoring and Failure Detection layer contains a definition of the services devoted to monitoring and failure detection activities. The Runtime Support Services, APIs, and Protocols chapter features as a main component the Proactive-Reactive Recovery service, whose aim is to guarantee perpetual correct execution of any components it protects.Project co-funded by the European Commission within the Sixth Frame-work Programme (2002-2006

    Autonomous Recovery in Componentized Internet Applications

    Get PDF
    In this paper we show how to reduce downtime of J2EE applications by rapidly and automatically recovering from transient and intermittent software failures, without requiring application modifications. Our prototype combines three application-agnostic techniques: macroanalysis for fault detection and localization, microrebooting for rapid recovery, and external management of recovery actions. The individual techniques are autonomous and work across a wide range of componentized Internet applications, making them well-suited to the rapidly changing software of Internet services. The proposed framework has been integrated with JBoss, an open-source J2EE application server. Our prototype provides an execution platform that can automatically recover J2EE applications within seconds of the manifestation of a fault. Our system can provide a subset of a system's active end users with the illusion of continuous uptime, in spite of failures occurring behind the scenes, even when there is no functional redundancy in the system

    The emergence and development of knowledge intensive mining service suppliers in the late 20th century

    Get PDF
    During the late 20th Century the mining industry went through an important technological rejuvenation that drove high rates of innovation, productivity growth and organisational change. This process included the emergence of knowledge-intensive mining services (KIMS) suppliers, who performed functions outsourced by mining companies, gradually strengthening their capabilities, enlarging their geographical scope and becoming a globally organised sector. But this was uneven across different mining economies. For instance, while numerous Australian KIMS suppliers emerged and achieved international competitiveness, few did this in Chile. Focusing on Chile, this thesis explores the reasons for the limited development of KIMS suppliers in a developing mining economy. It examines the technological learning that shaped the KIMS sector evolution in Chile by contrasting it with the Australian experience, using a two level learning model that integrates: (1) the interaction between industry-level factors that shaped the potential for learning at the micro-level; and (2) the interaction at the micro-level between accumulated capabilities and learning efforts by firms to exploit the potential for learning. KIMS learning is examined over four stages: (i) Gestation (1940s - early 1970s); (ii) Emergence and Development (mid-1970s to early 1980s); (iii) Internationalisation (late 1980s to late 1990s); and (iv) Consolidation (early 2000s and still going on). Over these stages, KIMS sector learning was much more limited in Chile than Australia, either because there was a lower learning potential and/or because firms carried out limited learning efforts to exploit the potential. At the first stage mining companies in Chile played a weak role as incubators of KIMS capabilities. Consequently, during the second stage there were few KIMS suppliers capable of profiting from the rejuvenation being experienced by the global industry. Also, with limited stimuli from the growth of mining in Chile, suppliers undertook limited learning efforts. So, the third stage found Chilean KIMS suppliers unprepared to exploit the learning potential that came with internationalisation; and the learning opportunities inherent in the significant expansion of Chilean mining production were captured by foreign KIMS suppliers, including Australians. Accordingly, Chilean KIMS suppliers started the Consolidation Stage without the capabilities to overcome the increasing barriers to participation in the industry‟s continuing high learning potential

    Integrated Software Architecture-Based Reliability Prediction for IT Systems

    Get PDF
    With the increasing importance of reliability in business and industrial IT systems, new techniques for architecture-based software reliability prediction are becoming an integral part of the development process. This dissertation thesis introduces a novel reliability modelling and prediction technique that considers the software architecture with its component structure, control and data flow, recovery mechanisms, its deployment to distributed hardware resources and the system\u27s usage profile

    Integrated Software Architecture-Based Reliability Prediction for IT Systems

    Get PDF
    With the increasing importance of reliability in business and industrial IT systems, new techniques for architecture-based software reliability prediction are becoming an integral part of the development process. This dissertation thesis introduces a novel reliability modelling and prediction technique that considers the software architecture with its component structure, control and data flow, recovery mechanisms, its deployment to distributed hardware resources and the system´s usage profile

    Website Performance Evaluation and Estimation in an E-business Environment

    Get PDF
    This thesis introduces a new Predictus-model for performance evaluation and estimation in a multi-layer website environment. The model is based on soft computing ideas, i.e. simulation and statistical analysis. The aim is to improve energy consumption of the website's hardware and investment efficiency and to avoid loss of availability. The aim of optimised exploitation is reduced energy and maintenance costs on the one hand and increased end-user satisfaction due to robust and stable web services on the other. A method based on simulation of user requests is described. Instead of ordinary static parameter set, the dynamic extraction from previous log files is used. The distribution of existing requests is exploited to generate the actual based natural load. By loading the server system with valid and well-known requests, the behaviour of the server system is natural. The control back loop on the generation of work load assures the validity of the work load in the long-term. A method for identifying the actual performance of the website is described. Using the well-known load in simulation of usage by a large number of virtual users and observing the utilisation rate of server resources ensure the best information for the internal state of the system. The disturbance of the service website usage can be avoided using the mathematical extrapolation method to reach the saturation point on the single server resource
    corecore