2,634 research outputs found

    MORPH: A Reference Architecture for Configuration and Behaviour Self-Adaptation

    Full text link
    An architectural approach to self-adaptive systems involves runtime change of system configuration (i.e., the system's components, their bindings and operational parameters) and behaviour update (i.e., component orchestration). Thus, dynamic reconfiguration and discrete event control theory are at the heart of architectural adaptation. Although controlling configuration and behaviour at runtime has been discussed and applied to architectural adaptation, architectures for self-adaptive systems often compound these two aspects reducing the potential for adaptability. In this paper we propose a reference architecture that allows for coordinated yet transparent and independent adaptation of system configuration and behaviour

    Elastic Business Process Management: State of the Art and Open Challenges for BPM in the Cloud

    Full text link
    With the advent of cloud computing, organizations are nowadays able to react rapidly to changing demands for computational resources. Not only individual applications can be hosted on virtual cloud infrastructures, but also complete business processes. This allows the realization of so-called elastic processes, i.e., processes which are carried out using elastic cloud resources. Despite the manifold benefits of elastic processes, there is still a lack of solutions supporting them. In this paper, we identify the state of the art of elastic Business Process Management with a focus on infrastructural challenges. We conceptualize an architecture for an elastic Business Process Management System and discuss existing work on scheduling, resource allocation, monitoring, decentralized coordination, and state management for elastic processes. Furthermore, we present two representative elastic Business Process Management Systems which are intended to counter these challenges. Based on our findings, we identify open issues and outline possible research directions for the realization of elastic processes and elastic Business Process Management.Comment: Please cite as: S. Schulte, C. Janiesch, S. Venugopal, I. Weber, and P. Hoenisch (2015). Elastic Business Process Management: State of the Art and Open Challenges for BPM in the Cloud. Future Generation Computer Systems, Volume NN, Number N, NN-NN., http://dx.doi.org/10.1016/j.future.2014.09.00

    Self-management for large-scale distributed systems

    Get PDF
    Autonomic computing aims at making computing systems self-managing by using autonomic managers in order to reduce obstacles caused by management complexity. This thesis presents results of research on self-management for large-scale distributed systems. This research was motivated by the increasing complexity of computing systems and their management. In the first part, we present our platform, called Niche, for programming self-managing component-based distributed applications. In our work on Niche, we have faced and addressed the following four challenges in achieving self-management in a dynamic environment characterized by volatile resources and high churn: resource discovery, robust and efficient sensing and actuation, management bottleneck, and scale. We present results of our research on addressing the above challenges. Niche implements the autonomic computing architecture, proposed by IBM, in a fully decentralized way. Niche supports a network-transparent view of the system architecture simplifying the design of distributed self-management. Niche provides a concise and expressive API for self-management. The implementation of the platform relies on the scalability and robustness of structured overlay networks. We proceed by presenting a methodology for designing the management part of a distributed self-managing application. We define design steps that include partitioning of management functions and orchestration of multiple autonomic managers. In the second part, we discuss robustness of management and data consistency, which are necessary in a distributed system. Dealing with the effect of churn on management increases the complexity of the management logic and thus makes its development time consuming and error prone. We propose the abstraction of Robust Management Elements, which are able to heal themselves under continuous churn. Our approach is based on replicating a management element using finite state machine replication with a reconfigurable replica set. Our algorithm automates the reconfiguration (migration) of the replica set in order to tolerate continuous churn. For data consistency, we propose a majority-based distributed key-value store supporting multiple consistency levels that is based on a peer-to-peer network. The store enables the tradeoff between high availability and data consistency. Using majority allows avoiding potential drawbacks of a master-based consistency control, namely, a single-point of failure and a potential performance bottleneck. In the third part, we investigate self-management for Cloud-based storage systems with the focus on elasticity control using elements of control theory and machine learning. We have conducted research on a number of different designs of an elasticity controller, including a State-Space feedback controller and a controller that combines feedback and feedforward control. We describe our experience in designing an elasticity controller for a Cloud-based key-value store using state-space model that enables to trade-off performance for cost. We describe the steps in designing an elasticity controller. We continue by presenting the design and evaluation of ElastMan, an elasticity controller for Cloud-based elastic key-value stores that combines feedforward and feedback control

    Capacity Management for Cloud Computing: A System Dynamics Approach

    Get PDF
    As the demand for cloud computing as a preferred computing architecture grows, the need for effective capacity planning by cloud providers becomes crucial for their long term viability. Situations involving under-capacity and over-capacity represent lost opportunities and increased overhead. Economic conditions play a critical role in determining the capacity, cost, and revenue of cloud-based services. Using a system dynamics approach, this study evaluates the different conditions in cloud ecosystem from a capacity planning and management perspective, with a view to providing cloud service providers guidance for cloud capacity building strategies

    Modular Coordination of Multiple Autonomic Managers

    Get PDF
    International audienceComplex computing systems are increasingly self-adaptive, with an autonomic computing approach for their administration. Real systems require the co-existence of multiple autonomic management loops, each complex to design. However their uncoordinated co-existence leads to performance degradation and possibly to inconsistency. There is a need for methodological supports facilitating the coordination of multiple autonomic managers. In this paper we propose a method focusing on the discrete control of the interactions of managers. We follow a component-based approach and explore modular discrete control, allowing to break down the combinatorial complexity inherent to the state-space exploration technique. This improves scalability of the approach and allows constructing a hierarchical control. It also allows re-using complex managers in different contexts without modifying their control specifications. We build a component-based coordination of managers, with introspection, adaptivity and reconfiguration. We validate our method on a multiple-loop multi-tier system

    Trustworthy autonomic architecture (TAArch): Implementation and empirical investigation

    Get PDF
    This paper presents a new architecture for trustworthy autonomic systems. This trustworthy autonomic architecture is different from the traditional autonomic computing architecture and includes mechanisms and instrumentation to explicitly support run-time self-validation and trustworthiness. The state of practice does not lend itself robustly enough to support trustworthiness and system dependability. For example, despite validating system's decisions within a logical boundary set for the system, there’s the possibility of overall erratic behaviour or inconsistency in the system emerging for example, at a different logical level or on a different time scale. So a more thorough and holistic approach, with a higher level of check, is required to convincingly address the dependability and trustworthy concerns. Validation alone does not always guarantee trustworthiness as each individual decision could be correct (validated) but overall system may not be consistent and thus not dependable. A robust approach requires that validation and trustworthiness are designed in and integral at the architectural level, and not treated as add-ons as they cannot be reliably retro-fitted to systems. This paper analyses the current state of practice in autonomic architecture, presents a different architectural approach for trustworthy autonomic systems, and uses a datacentre scenario as the basis for empirical analysis of behaviour and performance. Results show that the proposed trustworthy autonomic architecture has significant performance improvement over existing architectures and can be relied upon to operate (or manage) almost all level of datacentre scale and complexity

    The Architecture of an Autonomic, Resource-Aware, Workstation-Based Distributed Database System

    Get PDF
    Distributed software systems that are designed to run over workstation machines within organisations are termed workstation-based. Workstation-based systems are characterised by dynamically changing sets of machines that are used primarily for other, user-centric tasks. They must be able to adapt to and utilize spare capacity when and where it is available, and ensure that the non-availability of an individual machine does not affect the availability of the system. This thesis focuses on the requirements and design of a workstation-based database system, which is motivated by an analysis of existing database architectures that are typically run over static, specially provisioned sets of machines. A typical clustered database system -- one that is run over a number of specially provisioned machines -- executes queries interactively, returning a synchronous response to applications, with its data made durable and resilient to the failure of machines. There are no existing workstation-based databases. Furthermore, other workstation-based systems do not attempt to achieve the requirements of interactivity and durability, because they are typically used to execute asynchronous batch processing jobs that tolerate data loss -- results can be re-computed. These systems use external servers to store the final results of computations rather than workstation machines. This thesis describes the design and implementation of a workstation-based database system and investigates its viability by evaluating its performance against existing clustered database systems and testing its availability during machine failures.Comment: Ph.D. Thesi

    A feedback-based decentralised coordination model for distributed open real-time systems

    Get PDF
    Moving towards autonomous operation and management of increasingly complex open distributed real-time systems poses very significant challenges. This is particularly true when reaction to events must be done in a timely and predictable manner while guaranteeing Quality of Service (QoS) constraints imposed by users, the environment, or applications. In these scenarios, the system should be able to maintain a global feasible QoS level while allowing individual nodes to autonomously adapt under different constraints of resource availability and input quality. This paper shows how decentralised coordination of a group of autonomous interdependent nodes can emerge with little communication, based on the robust self-organising principles of feedback. Positive feedback is used to reinforce the selection of the new desired global service solution, while negative feedback discourages nodes to act in a greedy fashion as this adversely impacts on the provided service levels at neighbouring nodes. The proposed protocol is general enough to be used in a wide range of scenarios characterised by a high degree of openness and dynamism where coordination tasks need to be time dependent. As the reported results demonstrate, it requires less messages to be exchanged and it is faster to achieve a globally acceptable near-optimal solution than other available approaches

    Developing Real-Time Emergency Management Applications: Methodology for a Novel Programming Model Approach

    Get PDF
    The last years have been characterized by the arising of highly distributed computing platforms composed of a heterogeneity of computing and communication resources including centralized high-performance computing architectures (e.g. clusters or large shared-memory machines), as well as multi-/many-core components also integrated into mobile nodes and network facilities. The emerging of computational paradigms such as Grid and Cloud Computing, provides potential solutions to integrate such platforms with data systems, natural phenomena simulations, knowledge discovery and decision support systems responding to a dynamic demand of remote computing and communication resources and services. In this context time-critical applications, notably emergency management systems, are composed of complex sets of application components specialized for executing specific computations, which are able to cooperate in such a way as to perform a global goal in a distributed manner. Since the last years the scientific community has been involved in facing with the programming issues of distributed systems, aimed at the definition of applications featuring an increasing complexity in the number of distributed components, in the spatial distribution and cooperation between interested parties and in their degree of heterogeneity. Over the last decade the research trend in distributed computing has been focused on a crucial objective. The wide-ranging composition of distributed platforms in terms of different classes of computing nodes and network technologies, the strong diffusion of applications that require real-time elaborations and online compute-intensive processing as in the case of emergency management systems, lead to a pronounced tendency of systems towards properties like self-managing, self-organization, self-controlling and strictly speaking adaptivity. Adaptivity implies the development, deployment, execution and management of applications that, in general, are dynamic in nature. Dynamicity concerns the number and the specific identification of cooperating components, the deployment and composition of the most suitable versions of software components on processing and networking resources and services, i.e., both the quantity and the quality of the application components to achieve the needed Quality of Service (QoS). In time-critical applications the QoS specification can dynamically vary during the execution, according to the user intentions and the Developing Real-Time Emergency Management Applications: Methodology for a Novel Programming Model Approach Gabriele Mencagli and Marco Vanneschi Department of Computer Science, University of Pisa, L. Bruno Pontecorvo, Pisa Italy 2 2 Will-be-set-by-IN-TECH information produced by sensors and services, as well as according to the monitored state and performance of networks and nodes. The general reference point for this kind of systems is the Grid paradigm which, by definition, aims to enable the access, selection and aggregation of a variety of distributed and heterogeneous resources and services. However, though notable advancements have been achieved in recent years, current Grid technology is not yet able to supply the needed software tools with the features of high adaptivity, ubiquity, proactivity, self-organization, scalability and performance, interoperability, as well as fault tolerance and security, of the emerging applications. For this reason in this chapter we will study a methodology for designing high-performance computations able to exploit the heterogeneity and dynamicity of distributed environments by expressing adaptivity and QoS-awareness directly at the application level. An effective approach needs to address issues like QoS predictability of different application configurations as well as the predictability of reconfiguration costs. Moreover adaptation strategies need to be developed assuring properties like the stability degree of a reconfiguration decision and the execution optimality (i.e. select reconfigurations accounting proper trade-offs among different QoS objectives). In this chapter we will present the basic points of a novel approach that lays the foundations for future programming model environments for time-critical applications such as emergency management systems. The organization of this chapter is the following. In Section 2 we will compare the existing research works for developing adaptive systems in critical environments, highlighting their drawbacks and inefficiencies. In Section 3, in order to clarify the application scenarios that we are considering, we will present an emergency management system in which the run-time selection of proper application configuration parameters is of great importance for meeting the desired QoS constraints. In Section 4we will describe the basic points of our approach in terms of how compute-intensive operations can be programmed, how they can be dynamically modified and how adaptation strategies can be expressed. In Section 5 our approach will be contextualize to the definition of an adaptive parallel module, which is a building block for composing complex and distributed adaptive computations. Finally in Section 6 we will describe a set of experimental results that show the viability of our approach and in Section 7 we will give the concluding remarks of this chapter

    Requirements of the SALTY project

    Get PDF
    This document is the first external deliverable of the SALTY project (Self-Adaptive very Large disTributed sYstems), funded by the ANR under contract ANR-09-SEGI-012. It is the result of task 1.1 of the Work Package (WP) 1 : Requirements and Architecture. Its objective is to identify and collect requirements from use cases that are going to be developed in WP 4 (Use cases and Validation). Based on the study and classification of the use cases, requirements against the envisaged framework are then determined and organized in features. These features will aim at guide and control the advances in all work packages of the project. As a start, features are classified, briefly described and related scenarios in the defined use cases are pinpointed. In the following tasks and deliverables, these features will facilitate design by assigning priorities to them and defining success criteria at a finer grain as the project progresses. This report, as the first external document, has no dependency to any other external documents and serves as a reference to future external documents. As it has been built from the use cases studies that have been synthesized in two internal documents of the project, extracts from the two documents are made available as appendices (cf. appen- dices B and C)
    • …
    corecore