145 research outputs found

    Towards Middleware for Fault-tolerance in Distributed Real-time and Embedded Systems

    Get PDF
    Abstract. Distributed real-time and embedded (DRE) systems often require support for multiple simultaneous quality of service (QoS) properties, such as real-timeliness and fault tolerance, that operate within resource constrained environments. These resource constraints motivate the need for a lightweight middleware infrastructure, while the need for simultaneous QoS properties require the middleware to provide fault tolerance capabilities that respect time-critical needs of DRE systems. Conventional middleware solutions, such as Fault-tolerant CORBA (FT-CORBA) and Continuous Availability API for J2EE, have limited utility for DRE systems because they are heavyweight (e.g., the complexity of their feature-rich fault tolerance capabilities consumes excessive runtime resources), yet incomplete (e.g., they lack mechanisms that enable fault tolerance while maintaining real-time predictability). This paper provides three contributions to the development and standardization of lightweight real-time and fault-tolerant middleware for DRE systems. First, we discuss the challenges in realizing real-time faulttolerant solutions for DRE systems using contemporary middleware. Second, we describe recent progress towards standardizing a CORBA lightweight fault-tolerance specification for DRE systems. Third, we present the architecture of FLARe, which is a prototype based on the OMG real-time fault-tolerant CORBA middleware standardization efforts that is lightweight (e.g., leverages only those server-and client-side mechanisms required for real-time systems) and predictable (e.g., provides fault-tolerant mechanisms that respect time-critical performance needs of DRE systems)

    MDDPro: Model-Driven Dependability Provisioning in Enterprise Distributed Real-Time and Embedded Systems

    Get PDF
    Abstract Service oriented architecture (SOA) design principles are increasingly being adopted to develop distributed real-time and embedded (DRE) systems, such as avionics mission computing, due to the availability of real-time component middleware platforms. Traditional approaches to fault tolerance that rely on replication and recovery of a single server or a single host do not work in this paradigm since the fault management schemes must now account for the timely and simultaneous failover of groups of entities while improving system availability by minimizing the risk of simultaneous failures of replicated entities. This paper describes MDDPro, a model-driven dependability provisioning tool for DRE systems. MDDPro provides intuitive modeling abstractions to specify failover requirements of DRE systems at different granularities. MDDPro enables plugging in different replica placement algorithms to improve system availability. Finally, its generative capabilities automate the deployment and configuration of the DRE system on the underlying platforms

    Replicated execution of workflows

    Get PDF
    Workflows are the de facto standard for managing and optimizing business processes. Workflows allow businesses to automate interactions between business locations and partners residing anywhere on the planet. This, however, requires the workflows to be executed in a distributed and dynamic environment, where device and communication failures occur quite frequently. In case that a workflow execution becomes unavailable through such failures, the business operations that rely on the workflow might be hindered or even stopped, implying the loss of money. Consequently, availability is a key concern when using workflows in dynamic environments. In this thesis, we propose replication schemes for workflow engines to ensure the availability of the workflows that are executed by these engines. Of course, a workflow that is executed by a replicated workflow engine has to yield the same result as a non-replicated execution of that workflow. To this end, we formally define the equivalence of a replicated and a non-replicated execution called Single-Execution-Equivalence. Subsequently, we present replication schemes for both imperative and declarative workflow languages. Imperative workflow languages, such as the Web Service Business Process Execution Language (WS-BPEL), specify the execution order of activities through an ordering relation and are the predominant way of specifying workflow models. We implement a proof-of-concept for demonstrating the compatibility of our replication schemes with current (imperative) workflow technology. Declarative workflow languages provide greater flexibility by allowing the reordering of the activities within a workflow at run-time. We exploit this by executing differently ordered replicas on several nodes in the network for improving availability further

    Passive Fault-Tolerance Management in Component-Based Embedded Systems

    Get PDF
    It is imperative to accept that failures can and will occur even in meticulously designed distributed systems and to design proper measures to counter those failures. Passive replication minimizes resource consumption by only activating redundant replicas in case of failures, as typically, providing and applying state updates is less resource demanding than requesting execution. However, most existing solutions for passive fault tolerance are usually designed and configured at design time, explicitly and statically identifying the most critical components and their number of replicas, lacking the needed flexibility to handle the runtime dynamics of distributed component-based embedded systems. This paper proposes a cost-effective adaptive fault tolerance solution with a significant lower overhead compared to a strict active redundancy-based approach, achieving a high error coverage with a minimum amount of redundancy. The activation of passive replicas is coordinated through a feedback-based coordination model that reduces the complexity of the needed interactions among components until a new collective global service solution is determined, hence improving the overall maintainability and robustness of the system

    Real-Time Reliable Middleware for Industrial Internet-of-Things

    Get PDF
    This dissertation contributes to the area of adaptive real-time and fault-tolerant systems research, applied to Industrial Internet-of-Things (IIoT) systems. Heterogeneous timing and reliability requirements arising from IIoT applications have posed challenges for IIoT services to efficiently differentiate and meet such requirements. Specifically, IIoT services must both differentiate processing according to applications\u27 timing requirements (including latency, event freshness, and relative consistency of each other) and enforce the needed levels of assurance for data delivery (even as far as ensuring zero data loss). It is nontrivial for an IIoT service to efficiently differentiate such heterogeneous IIoT timing/reliability requirements to fit each application, especially when facing increasingly large data traffic and when common fault-tolerant mechanisms tend to introduce latency and latency jitters. This dissertation presents a new adaptive real-time fault-tolerant framework for IIoT systems, along with efficient and adaptive strategies to meet each IIoT application\u27s timing/reliability requirements. The contributions of the framework are demonstrated by three new IIoT middleware services: (1) Cyber-Physical Event Processing (CPEP), which both differentiates application-specific latency requirements and enforces cyber-physical timing constraints, by prioritizing, sharing, and shedding event processing. (2) Fault-Tolerant Real-Time Messaging (FRAME), which integrates real-time capabilities with a primary-backup replication system, to fit each application\u27s unique timing and loss-tolerance requirements. (3) Adaptive Real-Time Reliable Edge Computing (ARREC), which leverages heterogeneous loss-tolerance requirements and their different temporal laxities, to perform selective and lazy (yet timely) data replication, thus allowing the system to meet needed levels of loss-tolerance while reducing both the latency and bandwidth penalties that are typical of fault-tolerant sub-systems
    • …
    corecore