5,322 research outputs found

    Robust execution of service workflows using redundancy and advance reservations

    No full text
    In this paper, we develop a novel algorithm that allows service consumers to execute business processes (or workflows) of interdependent services in a dependable manner within tight time-constraints. In particular, we consider large inter-organisational service-oriented systems, where services are offered by external organisations that demand financial remuneration and where their use has to be negotiated in advance using explicit service-level agreements (as is common in Grids and cloud computing). Here, different providers often offer the same type of service at varying levels of quality and price. Furthermore, some providers may be less trustworthy than others, possibly failing to meet their agreements. To control this unreliability and ensure end-to-end dependability while maximising the profit obtained from completing a business process, our algorithm automatically selects the most suitable providers. Moreover, unlike existing work, it reasons about the dependability properties of a workflow, and it controls these by using service redundancy for critical tasks and by planning for contingencies. Finally, our algorithm reserves services for only parts of its workflow at any time, in order to retain flexibility when failures occur. We show empirically that our algorithm consistently outperforms existing approaches, achieving up to a 35-fold increase in profit and successfully completing most workflows, even when the majority of providers fail

    Resilient Critical Infrastructure Management using Service Oriented Architecture

    No full text
    Abstract—The SERSCIS project aims to support the use of interconnected systems of services in Critical Infrastructure (CI) applications. The problem of system interconnectedness is aptly demonstrated by ‘Airport Collaborative Decision Making’ (ACDM). Failure or underperformance of any of the interlinked ICT systems may compromise the ability of airports to plan their use of resources to sustain high levels of air traffic, or to provide accurate aircraft movement forecasts to the wider European air traffic management systems. The proposed solution is to introduce further SERSCIS ICT components to manage dependability and interdependency. These use semantic models of the critical infrastructure, including its ICT services, to identify faults and potential risks and to increase human awareness of them. Semantics allows information and services to be described in such a way that makes them understandable to computers. Thus when a failure (or a threat of failure) is detected, SERSCIS components can take action to manage the consequences, including changing the interdependency relationships between services. In some cases, the components will be able to take action autonomously — e.g. to manage ‘local’ issues such as the allocation of CPU time to maintain service performance, or the selection of services where there are redundant sources available. In other cases the components will alert human operators so they can take action instead. The goal of this paper is to describe a Service Oriented Architecture (SOA) that can be used to address the management of ICT components and interdependencies in critical infrastructure systems. Index Terms—resilience; QoS; SOA; critical infrastructure, SLA

    Proactive cloud management for highly heterogeneous multi-cloud infrastructures

    Get PDF
    Various literature studies demonstrated that the cloud computing paradigm can help to improve availability and performance of applications subject to the problem of software anomalies. Indeed, the cloud resource provisioning model enables users to rapidly access new processing resources, even distributed over different geographical regions, that can be promptly used in the case of, e.g., crashes or hangs of running machines, as well as to balance the load in the case of overloaded machines. Nevertheless, managing a complex geographically-distributed cloud deploy could be a complex and time-consuming task. Autonomic Cloud Manager (ACM) Framework is an autonomic framework for supporting proactive management of applications deployed over multiple cloud regions. It uses machine learning models to predict failures of virtual machines and to proactively redirect the load to healthy machines/cloud regions. In this paper, we study different policies to perform efficient proactive load balancing across cloud regions in order to mitigate the effect of software anomalies. These policies use predictions about the mean time to failure of virtual machines. We consider the case of heterogeneous cloud regions, i.e regions with different amount of resources, and we provide an experimental assessment of these policies in the context of ACM Framework
    • 

    corecore