1,874 research outputs found
Self-management for large-scale distributed systems
Autonomic computing aims at making computing systems self-managing by using autonomic managers in order to reduce obstacles caused by management complexity. This thesis presents results of research on self-management for large-scale distributed systems. This research was motivated by the increasing complexity of computing systems and their management.
In the first part, we present our platform, called Niche, for programming self-managing component-based distributed applications. In our work on Niche, we have faced and addressed the following four challenges in achieving
self-management in a dynamic environment characterized by volatile resources and high churn: resource discovery, robust and efficient sensing and actuation, management bottleneck, and scale. We present results of our research
on addressing the above challenges. Niche implements the autonomic computing architecture, proposed by IBM, in a fully decentralized way. Niche supports a network-transparent view of the system architecture simplifying
the design of distributed self-management. Niche provides a concise and expressive API for self-management. The implementation of the platform relies on the scalability and robustness of structured overlay networks. We proceed
by presenting a methodology for designing the management part of a distributed self-managing application. We define design steps that include partitioning of management functions and orchestration of multiple autonomic
managers. In the second part, we discuss robustness of management and data consistency, which are necessary in a distributed system. Dealing with the effect of churn on management increases the complexity of the management logic
and thus makes its development time consuming and error prone. We propose the abstraction of Robust Management Elements, which are able to heal themselves under continuous churn. Our approach is based on replicating a
management element using finite state machine replication with a reconfigurable replica set. Our algorithm automates the reconfiguration (migration) of the replica set in order to tolerate continuous churn. For data consistency, we propose a majority-based distributed key-value store supporting multiple consistency levels that is based on a peer-to-peer network. The store enables the tradeoff between high availability and data consistency. Using majority allows avoiding potential drawbacks of a master-based consistency control, namely, a single-point of failure and a potential performance bottleneck. In the third part, we investigate self-management for Cloud-based storage systems with the focus on elasticity control using elements of control theory and machine learning. We have conducted research on a number of different designs of an elasticity controller, including a State-Space feedback controller and a controller that combines feedback and feedforward control. We describe our experience in designing an elasticity controller for a Cloud-based key-value store using state-space model that enables to trade-off performance for cost. We describe the steps in designing an elasticity controller. We continue by
presenting the design and evaluation of ElastMan, an elasticity controller for Cloud-based elastic key-value stores that combines feedforward and feedback control
Collaborative Policy-Based Autonomic Management in IaaS Clouds
With the increasing number of machines (either virtual or physical) in a computing environment, it is becoming harder to monitor and manage these resources. Relying on human administrators, even with tools, is expensive and the growing complexity makes management even harder. The alternative is to look for automated approaches that can monitor and manage computing resources in real time with no human intervention. One of the approaches to this problem is policy-based autonomic management. However, in large systems having one single autonomic manager to manage everything is almost impossible. Therefore, multiple autonomic managers will be needed and these will need to cooperate in the overall management. We propose a management model using multiple autonomic managers organized in a hierarchical fashion to monitor and manage the resources in a computing environment based on provided policies. We develop a communication protocol to facilitate collaboration between different autonomic managers, define the core operations of these managers and introduce algorithms to deal with their deployment and operation. We also introduce an approach for the inference of the communication messages from policies and develop several algorithms for joining and maintaining the management hierarchy. We propose a deployment system that can discover relevant resources in a computing environment automatically to facilitate the deployment of autonomic managers at different levels of a physical system. We then test our approach by implementing it in a small private Infrastructure-as-a-Service (IaaS) cloud and show how this collaboration of autonomic managers in a hierarchical way can help to adopt to high stress situations automatically and reduce the SLA violation rate without adding any new resource to the environment
Elastic Business Process Management: State of the Art and Open Challenges for BPM in the Cloud
With the advent of cloud computing, organizations are nowadays able to react
rapidly to changing demands for computational resources. Not only individual
applications can be hosted on virtual cloud infrastructures, but also complete
business processes. This allows the realization of so-called elastic processes,
i.e., processes which are carried out using elastic cloud resources. Despite
the manifold benefits of elastic processes, there is still a lack of solutions
supporting them.
In this paper, we identify the state of the art of elastic Business Process
Management with a focus on infrastructural challenges. We conceptualize an
architecture for an elastic Business Process Management System and discuss
existing work on scheduling, resource allocation, monitoring, decentralized
coordination, and state management for elastic processes. Furthermore, we
present two representative elastic Business Process Management Systems which
are intended to counter these challenges. Based on our findings, we identify
open issues and outline possible research directions for the realization of
elastic processes and elastic Business Process Management.Comment: Please cite as: S. Schulte, C. Janiesch, S. Venugopal, I. Weber, and
P. Hoenisch (2015). Elastic Business Process Management: State of the Art and
Open Challenges for BPM in the Cloud. Future Generation Computer Systems,
Volume NN, Number N, NN-NN., http://dx.doi.org/10.1016/j.future.2014.09.00
Towards an Autonomic and Distributed Device Management for the Internet of Things
Best paper award at the Doctoral Symposium of ICAC 2019International audienc
Decentralized planning for self-adaptation in multi-cloud environment
The runtime management of Internet of Things (IoT) oriented applications deployed in multi-clouds is a complex issue due to the highly heterogeneous and dynamic execution environment. To effectively cope with such an environment, the cross-layer and multi-cloud effects should be taken into account and a decentralized self-adaptation is a promising solution to maintain and evolve the applications for quality assurance. An important issue to be tackled towards realizing this solution is the uncertainty effect of the adaptation, which may cause negative impact to the other layers or even clouds. In this paper, we tackle such an issue from the planning perspective, since an inappropriate planning strategy can fail the adaptation outcome. Therefore, we present an architectural model for decentralized self-adaptation to support the cross-layer and multi-cloud environment. We also propose a planning model and method to enable the decentralized decision making. The planning is formulated as a Reinforcement Learning problem and solved using the Q-learning algorithm. Through simulation experiments, we conduct a study to assess the effectiveness and sensitivity of the proposed planning approach. The results show that our approach can potentially reduce the negative impact on the cross-layer and multi-cloud environment
Coordinating multi-site construction projects using federated clouds
The requirements imposed by AEC (Architecture/Engineering/Construction) projects with regard to data storage and execution, on-demand data sharing and complexity on building simulations have led to utilising novel computing techniques. In detail, these requirements refer to storing the large amounts of data that the AEC industry generates — from building schematics to associated data derived from different contractors that are involved at various stages of the building lifecycle; or running simulations on building models (such as energy efficiency, environmental impact & occupancy simulations). Creating such a computing infrastructure to support operations deriving from various AEC projects can be challenging due to the complexity of workflows, distributed nature of the data and diversity of roles, profiles and location of the users. Federated clouds have provided the means to create a distributed environment that can support multiple individuals and organisations to work collaboratively. In this study we present how multi-site construction projects can be coordinated by the use of federated clouds where the interacting parties are represented by AEC industry organisations. We show how coordination can support (a) data sharing and interoperability using a multi-vendor Cloud environment and (b) process interoperability based on various stakeholders involved in the AEC project lifecycle. We develop a framework that facilitates project coordination with associated “issue status” implications and validate our outcome in a real construction project
Coordinated Autonomic Managers for Energy Efficient Date Centers
The complexity of today’s data centers has led researchers to investigate ways in which autonomic methods can be used for data center management. Autonomic managers try to monitor and manage resources to ensure that the components they manage are self-configuring, self-optimizing, self-healing and self-protecting (so called “self-*” properties). In this research, we consider autonomic management systems for data centers with a particular focus on making data centers more energy-aware. In particular, we consider a policy based, multi-manager autonomic management systems for energy aware data centers. Our focus is on defining the foundations – the core concepts, entities, relationships and algorithms - for autonomic management systems capable of supporting a range of management configurations. Central to our approach is the notion of a “topology” of autonomic managers that when instantiated can support a range of different configurations of autonomic managers and communication among them. The notion of “policy” is broadened to enable some autonomic managers to have more direct control over the behavior of other managers through changes in policies. The ultimate goal is to create a management framework that would allow the data center administrator to a) define managed objects, their corresponding managers, management system topology, and policies to meet their operation needs and b) rely on the management system to maintain itself automatically. A data center simulator that computes its energy consumption (computing and cooling) at any given time is implemented to evaluate the impact of different management scenarios. The management system is evaluated with different management scenarios in our simulated data center
Towards the decentralized coordination of multiple self-adaptive systems
When multiple self-adaptive systems share the same environment and have
common goals, they may coordinate their adaptations at runtime to avoid
conflicts and to satisfy their goals. There are two approaches to coordination.
(1) Logically centralized, where a supervisor has complete control over the
individual self-adaptive systems. Such approach is infeasible when the systems
have different owners or administrative domains. (2) Logically decentralized,
where coordination is achieved through direct interactions. Because the
individual systems have control over the information they share, decentralized
coordination accommodates multiple administrative domains. However, existing
techniques do not account simultaneously for both local concerns, e.g.,
preferences, and shared concerns, e.g., conflicts, which may lead to goals not
being achieved as expected. Our idea to address this shortcoming is to express
both types of concerns within the same constraint optimization problem. We
propose CoADAPT, a decentralized coordination technique introducing two types
of constraints: preference constraints, expressing local concerns, and
consistency constraints, expressing shared concerns. At runtime, the problem is
solved in a decentralized way using distributed constraint optimization
algorithms implemented by each self-adaptive system. As a first step in
realizing CoADAPT, we focus in this work on the coordination of adaptation
planning strategies, traditionally addressed only with centralized techniques.
We show the feasibility of CoADAPT in an exemplar from cloud computing and
analyze experimentally its scalability
- …