92 research outputs found

    Modeling and analysis of high availability techniques in a virtualized system

    Get PDF
    Availability evaluation of a virtualized system is critical to the wide deployment of cloud computing services. Time-based, prediction-based rejuvenation of virtual machines (VM) and virtual machine monitors, VM failover and live VM migration are common high-availability (HA) techniques in a virtualized system. This paper investigates the effect of combination of these availability techniques on VM availability in a virtualized system where various software and hardware failures may occur. For each combination, we construct analytic models rejuvenation mechanisms to improve VM availability; (2) prediction-based rejuvenation enhances VM availability much more than time-based VM rejuvenation when prediction successful probability is above 70%, regardless failover and/or live VM migration is also deployed; (3) failover mechanism outperforms live VM migration, although they can work together for higher availability of VM. In addition, they can combine with software rejuvenation mechanisms for even higher availability; (4) and time interval setting is critical to a time-based rejuvenation mechanism. These analytic results provide guidelines for deploying and parameter setting of HA techniques in a virtualized system

    Stochastic Reward Net-based Modeling Approach for Availability Quantification of Data Center Systems

    Get PDF
    Availability quantification and prediction of IT infrastructure in data centers are of paramount importance for online business enterprises. In this chapter, we present comprehensive availability models for practical case studies in order to demonstrate a state-space stochastic reward net model for typical data center systems for quantitative assessment of system availability. We present stochastic reward net models of a virtualized server system, a data center network based on DCell topology, and a conceptual data center for disaster tolerance. The systems are then evaluated against various metrics of interest, including steady state availability, downtime and downtime cost, and sensitivity analysis

    Proactive software rejuvenation solution for web enviroments on virtualized platforms

    Get PDF
    The availability of the Information Technologies for everything, from everywhere, at all times is a growing requirement. We use information Technologies from common and social tasks to critical tasks like managing nuclear power plants or even the International Space Station (ISS). However, the availability of IT infrastructures is still a huge challenge nowadays. In a quick look around news, we can find reports of corporate outage, affecting millions of users and impacting on the revenue and image of the companies. It is well known that, currently, computer system outages are more often due to software faults, than hardware faults. Several studies have reported that one of the causes of unplanned software outages is the software aging phenomenon. This term refers to the accumulation of errors, usually causing resource contention, during long running application executions, like web applications, which normally cause applications/systems to hang or crash. Gradual performance degradation could also accompany software aging phenomena. The software aging phenomena are often related to memory bloating/ leaks, unterminated threads, data corruption, unreleased file-locks or overruns. We can find several examples of software aging in the industry. The work presented in this thesis aims to offer a proactive and predictive software rejuvenation solution for Internet Services against software aging caused by resource exhaustion. To this end, we first present a threshold based proactive rejuvenation to avoid the consequences of software aging. This first approach has some limitations, but the most important of them it is the need to know a priori the resource or resources involved in the crash and the critical condition values. Moreover, we need some expertise to fix the threshold value to trigger the rejuvenation action. Due to these limitations, we have evaluated the use of Machine Learning to overcome the weaknesses of our first approach to obtain a proactive and predictive solution. Finally, the current and increasing tendency to use virtualization technologies to improve the resource utilization has made traditional data centers turn into virtualized data centers or platforms. We have used a Mathematical Programming approach to virtual machine allocation and migration to optimize the resources, accepting as many services as possible on the platform while at the same time, guaranteeing the availability (via our software rejuvenation proposal) of the services deployed against the software aging phenomena. The thesis is supported by an exhaustive experimental evaluation that proves the effectiveness and feasibility of our proposals for current systems

    Proactive cloud management for highly heterogeneous multi-cloud infrastructures

    Get PDF
    Various literature studies demonstrated that the cloud computing paradigm can help to improve availability and performance of applications subject to the problem of software anomalies. Indeed, the cloud resource provisioning model enables users to rapidly access new processing resources, even distributed over different geographical regions, that can be promptly used in the case of, e.g., crashes or hangs of running machines, as well as to balance the load in the case of overloaded machines. Nevertheless, managing a complex geographically-distributed cloud deploy could be a complex and time-consuming task. Autonomic Cloud Manager (ACM) Framework is an autonomic framework for supporting proactive management of applications deployed over multiple cloud regions. It uses machine learning models to predict failures of virtual machines and to proactively redirect the load to healthy machines/cloud regions. In this paper, we study different policies to perform efficient proactive load balancing across cloud regions in order to mitigate the effect of software anomalies. These policies use predictions about the mean time to failure of virtual machines. We consider the case of heterogeneous cloud regions, i.e regions with different amount of resources, and we provide an experimental assessment of these policies in the context of ACM Framework

    An analysis of software aging in cloud environment

    Get PDF
    Cloud Computing is the environment in which several virtual machines (VM) run concurrently on physical machines. The cloud computing infrastructure hosts multiple cloud service segments that communicate with each other using the interfaces. This creates distributed computing environment. During operation, the software systems accumulate errors or garbage that leads to system failure and other hazardous consequences. This status is called software aging. Software aging happens because of memory fragmentation, resource consumption in large scale and accumulation of numerical error. Software aging degrads the performance that may result in system failure. This happens because of premature resource exhaustion. This issue cannot be determined during software testing phase because of the dynamic nature of operation. The errors that cause software aging are of special types. These errors do not disturb the software functionality but target the response time and its environment. This issue is to be resolved only during run time as it occurs because of the dynamic nature of the problem. To alleviate the impact of software aging, software rejuvenation technique is being used. Rejuvenation process reboots the system or re-initiates the softwares. This avoids faults or failure. Software rejuvenation removes accumulated error conditions, frees up deadlocks and defragments operating system resources like memory. Hence, it avoids future failures of system that may happen due to software aging. As service availability is crucial, software rejuvenation is to be carried out at defined schedules without disrupting the service. The presence of Software rejuvenation techniques can make software systems more trustworthy. Software designers are using this concept to improve the quality and reliability of the software. Software aging and rejuvenation has generated a lot of research interest in recent years. This work reviews some of the research works related to detection of software aging and identifies research gaps

    Multi-perspective Evaluation of Self-Healing Systems Using Simple Probabilistic Models

    Get PDF
    Quantifying the efficacy of self-healing systems is a challenging but important task, which has implications for increasing designer, operator and end-user confidence in these systems. During design system architects benefit from tools and techniques that enhance their understanding of the system, allowing them to reason about the tradeoffs of proposed or existing self-healing mechanisms and the overall effectiveness of the system as a result of different mechanism-compositions. At deployment time, system integrators and operators need to understand how the selfhealing mechanisms work and how their operation impacts the system's reliability, availability and serviceability (RAS) in order to cope with any limitations of these mechanisms when the system is placed into production. In this paper we construct an evaluation framework for selfhealing systems around simple, yet powerful, probabilistic models that capture the behavior of the system's selfhealing mechanisms from multiple perspectives (designer, operator, and end-user). We combine these analytical models with runtime fault-injection to study the operation of VM-Rejuv — a virtual machine based rejuvenation scheme for web-application servers. We use the results from the fault-injection experiments and model-analysis to reason about the efficacy of VM-Rejuv, its limitations and strategies for managing/mitigating these limitations in system deployments. Whereas we use VM-Rejuv as the subject of our evaluation in this paper, our main contribution is a practical evaluation approach that can be generalized to other self-healing systems

    A Latency-driven Availability Assessment for Multi-Tenant Service Chains

    Get PDF
    Nowadays, most telecommunication services adhere to the Service Function Chain (SFC) paradigm, where network functions are implemented via software. In particular, container virtualization is becoming a popular approach to deploy network functions and to enable resource slicing among several tenants. The resulting infrastructure is a complex system composed by a huge amount of containers implementing different SFC functionalities, along with different tenants sharing the same chain. The complexity of such a scenario lead us to evaluate two critical metrics: the steady-state availability (the probability that a system is functioning in long runs) and the latency (the time between a service request and the pertinent response). Consequently, we propose a latency-driven availability assessment for multi-tenant service chains implemented via Containerized Network Functions (CNFs). We adopt a multi-state system to model single CNFs and the queueing formalism to characterize the service latency. To efficiently compute the availability, we develop a modified version of the Multidimensional Universal Generating Function (MUGF) technique. Finally, we solve an optimization problem to minimize the SFC cost under an availability constraint. As a relevant example of SFC, we consider a containerized version of IP Multimedia Subsystem, whose parameters have been estimated through fault injection techniques and load tests
    • …
    corecore