15 research outputs found

    Using Virtualization to Improve Software Rejuvenation

    Full text link
    In this paper, we present an approach for software rejuvenation based on automated self-healing techniques that can be easily applied to off-the-shelf Application Servers and Internet sites. Software aging and transient failures are detected through continuous monitoring of system data and performability metrics of the application server. If some anomalous behavior is identified the system triggers an automatic rejuvenation action. This self-healing scheme is meant to be the less disruptive as possible for the running service and to get a zero downtime for most of the cases. In our scheme, we exploit the usage of virtualization to optimize the self-recovery actions. The techniques described in this paper have been tested with a set of open-source Linux tools and the XEN virtualization middleware. We conducted an experimental study with two applications benchmarks (Tomcat/Axis and TPC-W). Our results demonstrate that virtualization can be extremely helpful for software rejuvenation and fail-over in the occurrence of transient application failures and software aging. 1

    Proactive cloud management for highly heterogeneous multi-cloud infrastructures

    Get PDF
    Various literature studies demonstrated that the cloud computing paradigm can help to improve availability and performance of applications subject to the problem of software anomalies. Indeed, the cloud resource provisioning model enables users to rapidly access new processing resources, even distributed over different geographical regions, that can be promptly used in the case of, e.g., crashes or hangs of running machines, as well as to balance the load in the case of overloaded machines. Nevertheless, managing a complex geographically-distributed cloud deploy could be a complex and time-consuming task. Autonomic Cloud Manager (ACM) Framework is an autonomic framework for supporting proactive management of applications deployed over multiple cloud regions. It uses machine learning models to predict failures of virtual machines and to proactively redirect the load to healthy machines/cloud regions. In this paper, we study different policies to perform efficient proactive load balancing across cloud regions in order to mitigate the effect of software anomalies. These policies use predictions about the mean time to failure of virtual machines. We consider the case of heterogeneous cloud regions, i.e regions with different amount of resources, and we provide an experimental assessment of these policies in the context of ACM Framework

    Machine Learning for Achieving Self-* Properties and Seamless Execution of Applications in the Cloud

    Get PDF
    Software anomalies are recognized as a major problem affecting the performance and availability of many computer systems. Accumulation of anomalies of different nature, such as memory leaks and unterminated threads, may lead the system to both fail or work with suboptimal performance levels. This problem particularly affects web servers, where hosted applications are typically intended to continuously run, thus incrementing the probability, therefore the associated effects, of accumulation of anomalies. Given the unpredictability of occurrence of anomalies, continuous system monitoring would be required to detect possible system failures and/or excessive performance degradation in order to timely start some recovering procedure. In this paper, we present a Machine Learning-based framework for proactive management of client-server applications in the cloud. Through optimized Machine Learning models and continually measuring system features, the framework predicts the remaining time to the occurrence of some unexpected event (system failure, service level agreement violation, etc.) of a virtual machine hosting a server instance of the application. The framework is able to manage virtual machines in the presence of different types anomalies and with different anomaly occurrence patterns. We show the effectiveness of the proposed solution by presenting results of a set of experiments we carried out in the context of a real world-inspired scenario

    Detecting Software Aging in a Cloud Computing Framework by Comparing Development Versions

    Get PDF
    Abstract-Software aging, i.e. degradation of software performance or functionality caused by resource depletion is usually discovered only in the production scenario. This incurs large costs and delays of defect removal and requires provisional solutions such as rejuvenation (controlled restarts). We propose a method for detecting aging problems shortly after their introduction by runtime comparisons of different development versions of the same software. Possible aging issues are discovered by analyzing the differences in runtime traces of selected metrics. The required comparisons are workload-independent which minimizes the additional effort of dedicated stress tests. Consequently, the method requires only minimal changes to the traditional development and testing process. This paves the way to detecting such problems before public releases, greatly reducing the cost of defect fixing. Our study focuses on the memory leaks of Eucalyptus, a popular open source framework for managing cloud computing environments

    Experimental Validation of the Suitability of Virtualization-Based Replication for Fault Tolerance in Real-Time Control of Electric Grids

    Get PDF
    Real-time control systems (RTCSs) perform complex control and require low response times. They typically use third-party software libraries and are deployed on generic hardware, which suffer from delay faults that can cause serious damage. To improve availability and latency, the controllers in RTCSs are replicated on physical nodes. As physical replication is expensive, we study the alternative of exploiting virtualization technology to run multiple virtual replicas on the same physical node. As virtual replicas share the same resources, the delay faults they experience might be correlated, which would make such a replication method unsuitable. We conduct several experiments with an RTCS for electric grids, with multiple virtual replicas of its controller. We find that although the delay of a virtual machine is higher than of a physical machine, the correlation between high delays among the virtual replicas is insignificant, causing an overall improved availability.We conclude that virtual replication is indeed applicable to certain RTCSs, as it can improve reliability without added cost

    Старіння програмного забезпечення в контексті його надійності: огляд проблематики

    Get PDF
    This paper presents the review and analysis of literary sources devoted to the study of the software aging phenomenon. The aging process is characterized as performance deterioration and increase of failure rate that has a negative impact on the software reliability. The study has found that software errors and their accumulation during program execution were the cause of the aging software. The basic concepts and characteristics related to the phenomenon of aging, such as effects, factors and aging metrics, time to resource exhaustion, time to aging-related failure and workload are determined. One of the software aging characteristics determines that it can be removed or delayed by external intervention. The technique of prevention and delay of aging is called software rejuvenation. The paper considered a common set of factors that is characteristic of all systems and the phenomenon of aging in general. The factors can be classified into two following types: external, such as software errors and code metrics, and external, such as environment, human and functional. The important task is to identify specific factors for specific systems, in particular, mobile platforms. The study reviewed and compared the main methods and approaches to study and modeling of software aging process. Aging phenomenon is studied at the theoretical level using analytical models and at the empirical level using data analysis. The paper states that the hybrid approaches could be used in researches because they incorporate the benefits of approaches based on analytical models and on measurements. Aging characteristics indicate that mobile operating systems and applications exposed to aging and there is a need to study this phenomenon in order to ensure the reliability of modern software. Mobile systems are vulnerable to manifestations of aging effects, since they work for a long time without rebooting and have a limited amount of resources, such as memory. To sum up, it is necessary to continue research of the mobile software aging, in particular to identify the aging factors of mobile applications and explore the application of methods and models for mobile systems.Проведено огляд та аналіз літературних джерел, в яких досліджено явища старіння програмного забезпечення. Процес старіння охарактеризовано як погіршення продуктивності і збільшення кількості відмов, що має негативний вплив на показники надійності програмного забезпечення. Встановлено, що помилки програмного забезпечення та їх накопичення протягом виконання програми є причиною виникнення старіння програмного забезпечення. Визначено основні поняття та характеристики, що стосуються явища старіння, зокрема ефекти, чинники та метрики старіння, час до виснаження ресурсів, час до відмови старіння та робоче навантаження. Розглянуто класифікацію чинників старіння програмного забезпечення. Встановлено, що чинники можуть бути загальні для всіх систем і спеціальні для конкретних систем, зокрема мобільних. Здійснено порівняльний аналіз основних методів та підходів до моделювання процесу старіння програмного забезпечення. З'ясовано, що розроблення гібридних підходів та моделей, які включають переваги аналітичних моделей та моделей на основі вимірювань, є перспективним напрямом у вивченні проблеми старіння ПЗ. Показано, що мобільні операційні системи та додатки є особливо чутливими до ефектів старіння, оскільки вони працюють тривалий час без перезавантаження та часто мають обмежені ресурси, такі як пам'ять. Обґрунтовано актуальність урахування впливу цього явища для забезпечення надійності сучасних мобільних і вбудованих систем

    Proactive software rejuvenation solution for web enviroments on virtualized platforms

    Get PDF
    The availability of the Information Technologies for everything, from everywhere, at all times is a growing requirement. We use information Technologies from common and social tasks to critical tasks like managing nuclear power plants or even the International Space Station (ISS). However, the availability of IT infrastructures is still a huge challenge nowadays. In a quick look around news, we can find reports of corporate outage, affecting millions of users and impacting on the revenue and image of the companies. It is well known that, currently, computer system outages are more often due to software faults, than hardware faults. Several studies have reported that one of the causes of unplanned software outages is the software aging phenomenon. This term refers to the accumulation of errors, usually causing resource contention, during long running application executions, like web applications, which normally cause applications/systems to hang or crash. Gradual performance degradation could also accompany software aging phenomena. The software aging phenomena are often related to memory bloating/ leaks, unterminated threads, data corruption, unreleased file-locks or overruns. We can find several examples of software aging in the industry. The work presented in this thesis aims to offer a proactive and predictive software rejuvenation solution for Internet Services against software aging caused by resource exhaustion. To this end, we first present a threshold based proactive rejuvenation to avoid the consequences of software aging. This first approach has some limitations, but the most important of them it is the need to know a priori the resource or resources involved in the crash and the critical condition values. Moreover, we need some expertise to fix the threshold value to trigger the rejuvenation action. Due to these limitations, we have evaluated the use of Machine Learning to overcome the weaknesses of our first approach to obtain a proactive and predictive solution. Finally, the current and increasing tendency to use virtualization technologies to improve the resource utilization has made traditional data centers turn into virtualized data centers or platforms. We have used a Mathematical Programming approach to virtual machine allocation and migration to optimize the resources, accepting as many services as possible on the platform while at the same time, guaranteeing the availability (via our software rejuvenation proposal) of the services deployed against the software aging phenomena. The thesis is supported by an exhaustive experimental evaluation that proves the effectiveness and feasibility of our proposals for current systems

    Autonomic Rejuvenation of Cloud Applications as a Countermeasure to Software Anomalies

    Get PDF
    Failures in computer systems can be often tracked down to software anomalies of various kinds. In many scenarios, it could be difficult, unfeasible, or unprofitable to carry out extensive debugging activity to spot the causes of anomalies and remove them. In other cases, taking corrective actions may led to undesirable service downtime. In this article we propose an alternative approach to cope with the problem of software anomalies in cloud-based applications, and we present the design of a distributed autonomic framework that implements our approach. It exploits the elastic capabilities of cloud infrastructures, and relies on machine learning models, proactive rejuvenation techniques and a new load balancing approach. By putting together all these elements, we show that it is possible to improve both availability and performance of applications deployed over heterogeneous cloud regions and subject to frequent failures. Overall, our study demonstrates the viability of our approach, thus opening the way towards it adoption, and encouraging further studies and practical experiences to evaluate and improve it

    Using Session Replication in Web Services

    Get PDF
    Web services have become to play an important role in people's everyday life. Along with their importance requirements and expectations for web services have risen and they will continue to grow in the future. Session replication is a technique which improves many quality aspects of web services, such as maintainability, availability and reliability. It reduces down time of the servers and betters the user experience by improving these factors in the wen service. The idea in session replication is to make the web services individual application servers users session objects available for all the application servers by storing them on a third party storage. In this thesis there is a survey on the technologies used in the web on a broad spectrum and a survey on the most promising open source session replication solution in the web. After studying and comparing these technologies, a pilot implementation is made to see the technologies in action and to see how they compare to a solution where no session replication is used. The goal is to find the best option for production system session replication. All of the implemented replication technologies worked and the goals set for this thesis were met. The findings of this study will be used as a baseline for the production session replication solutions
    corecore