12 research outputs found

    Software Aging Analysis of Web Server Using Neural Networks

    Full text link
    Software aging is a phenomenon that refers to progressive performance degradation or transient failures or even crashes in long running software systems such as web servers. It mainly occurs due to the deterioration of operating system resource, fragmentation and numerical error accumulation. A primitive method to fight against software aging is software rejuvenation. Software rejuvenation is a proactive fault management technique aimed at cleaning up the system internal state to prevent the occurrence of more severe crash failures in the future. It involves occasionally stopping the running software, cleaning its internal state and restarting it. An optimized schedule for performing the software rejuvenation has to be derived in advance because a long running application could not be put down now and then as it may lead to waste of cost. This paper proposes a method to derive an accurate and optimized schedule for rejuvenation of a web server (Apache) by using Radial Basis Function (RBF) based Feed Forward Neural Network, a variant of Artificial Neural Networks (ANN). Aging indicators are obtained through experimental setup involving Apache web server and clients, which acts as input to the neural network model. This method is better than existing ones because usage of RBF leads to better accuracy and speed in convergence.Comment: 11 pages, 8 figures, 1 table; International Journal of Artificial Intelligence & Applications (IJAIA), Vol.3, No.3, May 201

    Analysis of Software Aging in a Web Server

    Get PDF
    A number of recent studies have reported the phenomenon of “software aging”, characterized by progressive performance degradation and/or an increased occurrence rate of hang/crash failures of a software system due to the exhaustion of operating system resources or the accumulation of errors. To counteract this phenomenon, a proactive technique called 'software rejuvenation' has been proposed. It essentially involves stopping the running software, cleaning its internal state and/or its environment and then restarting it. Software rejuvenation, being preventive in nature, begs the question as to when to schedule it. Periodic rejuvenation, while straightforward to implement, may not yield the best results, because the rate at which software ages is not constant, but it depends on the time-varying system workload. Software rejuvenation should therefore be planned and initiated in the face of the actual system behavior. This requires the measurement, analysis and prediction of system resource usage. In this paper, we study the development of resource usage in a web server while subjecting it to an artificial workload. We first collect data on several system resource usage and activity parameters. Non-parametric statistical methods are then applied for detecting and estimating trends in the data sets. Finally, we fit time series models to the data collected. Unlike the models used previously in the research on software aging, these time series models allow for seasonal patterns, and we show how the exploitation of the seasonal variation can help in adequately predicting the future resource usage. Based on the models employed here, proactive management techniques like software rejuvenation triggered by actual measurements can be built. --Software aging,software rejuvenation,Linux,Apache,web server,performance monitoring,prediction of resource utilization,non-parametric trend analysis,time series analysis

    Near-optimal scheduling and decision-making models for reactive and proactive fault tolerance mechanisms

    Get PDF
    As High Performance Computing (HPC) systems increase in size to fulfill computational power demand, the chance of failure occurrences dramatically increases, resulting in potentially large amounts of lost computing time. Fault Tolerance (FT) mechanisms aim to mitigate the impact of failure occurrences to the running applications. However, the overhead of FT mechanisms increases proportionally to the HPC systems\u27 size. Therefore, challenges arise in handling the expensive overhead of FT mechanisms while minimizing the large amount of lost computing time due to failure occurrences. In this dissertation, a near-optimal scheduling model is built to determine when to invoke a hybrid checkpoint mechanism, by means of stochastic processes and calculus of variations. The obtained schedule minimizes the waste time caused by checkpoint mechanism and failure occurrences. Generally, the checkpoint/restart mechanisms periodically save application states and load the saved state, upon failure occurrences. Furthermore, to handle various FT mechanisms, an adaptive decision-making model has been developed to determine the best FT strategy to invoke at each decision point. The best mechanism at each decision point is selected among considered FT mechanisms to globally minimize the total waste time for an application execution by means of a dynamic programming approach. In addition, the model is adaptive to deal with changes in failure rate over time

    Um estudo sobre rejuvenescimento de software em servidores web apache

    Get PDF
    Dissertação (mestrado) - Universidade Federal de Santa Catarina, Centro Tecnológico. Programa de Pós-Graduação em Ciência da Computação

    Proactive software rejuvenation solution for web enviroments on virtualized platforms

    Get PDF
    The availability of the Information Technologies for everything, from everywhere, at all times is a growing requirement. We use information Technologies from common and social tasks to critical tasks like managing nuclear power plants or even the International Space Station (ISS). However, the availability of IT infrastructures is still a huge challenge nowadays. In a quick look around news, we can find reports of corporate outage, affecting millions of users and impacting on the revenue and image of the companies. It is well known that, currently, computer system outages are more often due to software faults, than hardware faults. Several studies have reported that one of the causes of unplanned software outages is the software aging phenomenon. This term refers to the accumulation of errors, usually causing resource contention, during long running application executions, like web applications, which normally cause applications/systems to hang or crash. Gradual performance degradation could also accompany software aging phenomena. The software aging phenomena are often related to memory bloating/ leaks, unterminated threads, data corruption, unreleased file-locks or overruns. We can find several examples of software aging in the industry. The work presented in this thesis aims to offer a proactive and predictive software rejuvenation solution for Internet Services against software aging caused by resource exhaustion. To this end, we first present a threshold based proactive rejuvenation to avoid the consequences of software aging. This first approach has some limitations, but the most important of them it is the need to know a priori the resource or resources involved in the crash and the critical condition values. Moreover, we need some expertise to fix the threshold value to trigger the rejuvenation action. Due to these limitations, we have evaluated the use of Machine Learning to overcome the weaknesses of our first approach to obtain a proactive and predictive solution. Finally, the current and increasing tendency to use virtualization technologies to improve the resource utilization has made traditional data centers turn into virtualized data centers or platforms. We have used a Mathematical Programming approach to virtual machine allocation and migration to optimize the resources, accepting as many services as possible on the platform while at the same time, guaranteeing the availability (via our software rejuvenation proposal) of the services deployed against the software aging phenomena. The thesis is supported by an exhaustive experimental evaluation that proves the effectiveness and feasibility of our proposals for current systems

    Envelhecimento de software utilizando ensaios de vida acelerados quantitativos

    Get PDF
    Tese (doutorado) - Universidade Federal de Santa Catarina, Centro Tecnológico. Programa de Pós-Graduação em Engenharia de ProduçãoEste trabalho apresenta uma abordagem sistematizada para acelerar o tempo de vida de sistemas que são acometidos pelos efeitos do envelhecimento de software. Estudos de confiabilidade voltados para estes sistemas necessitam realizar a observação dos tempos de falhas causadas pelo envelhecimento de software, o que exige experimentos de longa duração. Esta exigência cria diversas restrições, principalmente quando o tempo de experimentação implica em prazos e custos proibitivos para o estudo. Neste sentido, este trabalho apresenta uma proposta para acelerar a vida de sistemas que falham por envelhecimento de software, reduzindo o tempo de experimentação necessário para observar as suas falhas, o que reduz os prazos e custos das pesquisas nesta área. A fundamentação teórica deste estudo contou com um arcabouço conceitual envolvendo as áreas de dependabilidade computacional, engenharia de confiabilidade, projeto de experimentos, ensaios de vida acelerados e o estudo da fenomenologia do envelhecimento de software. A técnica de aceleração adotada foi a de ensaios de degradação acelerados, a qual tem sido largamente utilizada em diversas áreas da indústria, mas até o momento não tinha sido usada em estudos envolvendo produtos de software. A elaboração dos meios que permitiram aplicar esta técnica no âmbito da engenharia de software experimental, abordando especialmente o problema do envelhecimento de software, é a principal contribuição desta pesquisa. Em conjunto com a fundamentação teórica foi possível avaliar a aplicabilidade do método proposto a partir de um estudo de caso real, envolvendo a aceleração do envelhecimento de um software servidor web. Dentre os principais resultados obtidos no estudo experimental, destaca-se a identificação dos tratamentos que mais contribuíram para o envelhecimento do software servidor web. A partir destes tratamentos foi possível definir o padrão de carga de trabalho que mais influenciou no envelhecimento do servidor web analisado, sendo que o tipo e tamanho de páginas requisitadas foram os dois fatores mais significativos. Outro resultado importante diz respeito à verificação de que a variação na taxa de requisições do servidor web não influenciou o seu envelhecimento. Com relação à redução no período de experimentação, o método proposto apresentou o menor tempo em comparação aos valores previamente reportados na literatura para experimentos similares, tendo sido 3,18 vezes inferior ao menor tempo encontrado. Em termos de MTBF estimado, com e sem a aceleração do envelhecimento, obteve-se uma redução de aproximadamente 687 vezes no tempo de experimentação aplicando-se o método proposto. This research work presents a systematic approach to accelerate the lifetime of systems that fail due to the software aging effects. Reliability engineering studies applied to systems that require the observation of time to failures caused by software aging normally require a long observation period. This requirement introduces several practical constraints, mainly when the experiment duration demands prohibitive time and cost. The present work shows a proposal to accelerate the lifetime of systems that fail due to software aging, reducing the experimentation time to observe their failures, which means smaller time and costs for research works in this area. The theoretical fundamentals used by the proposed method were based on concepts of the following areas: computing dependability, reliability engineering, design of experiments, accelerated life tests and the software aging phenomenology. The lifetime acceleration technique adopted was the quantitative accelerated degradation test. This technique is largely used in several industry areas, however until the moment it hadn't been used in the software engineering field. The specification of means that allowed applying this technique to the experimental software engineering area, especially to approach the software aging problem, it is considered the main contribution of this research work. Also, it was possible to evaluate the applicability of the proposed method in a case study related to the software aging acceleration of a real web server. An important result was the identification of treatments that contributed to the web server aging. Based on these treatments was possible to define a workload standard that most influenced the aging effects on the web server analyzed, where the page size and page type were two significant factors. Another important result of this case study is regarding the request rate variability, that hadn't influence on the aging of the investigated web server software. Regarding the reduction of the experimentation period, the proposed method showed a shorter duration than values from similar experiments previously published, being 3.18 times less than the shorter experimentation time found in the literature. In terms of MTBF estimates, obtained with and without the aging acceleration, it was possible to achieve a reduction of approximately 687 times of the experimentation time using the proposed method

    Improved self-management of datacenter systems applying machine learning

    Get PDF
    Autonomic Computing is a Computer Science and Technologies research area, originated during mid 2000's. It focuses on optimization and improvement of complex distributed computing systems through self-control and self-management. As distributed computing systems grow in complexity, like multi-datacenter systems in cloud computing, the system operators and architects need more help to understand, design and optimize manually these systems, even more when these systems are distributed along the world and belong to different entities and authorities. Self-management lets these distributed computing systems improve their resource and energy management, a very important issue when resources have a cost, by obtaining, running or maintaining them. Here we propose to improve Autonomic Computing techniques for resource management by applying modeling and prediction methods from Machine Learning and Artificial Intelligence. Machine Learning methods can find accurate models from system behaviors and often intelligible explanations to them, also predict and infer system states and values. These models obtained from automatic learning have the advantage of being easily updated to workload or configuration changes by re-taking examples and re-training the predictors. So employing automatic modeling and predictive abilities, we can find new methods for making "intelligent" decisions and discovering new information and knowledge from systems. This thesis departs from the state of the art, where management is based on administrators expertise, well known data, ad-hoc studied algorithms and models, and elements to be studied from computing machine point of view; to a novel state of the art where management is driven by models learned from the same system, providing useful feedback, making up for incomplete, missing or uncertain data, from a global network of datacenters point of view. - First of all, we cover the scenario where the decision maker works knowing all pieces of information from the system: how much will each job consume, how is and will be the desired quality of service, what are the deadlines for the workload, etc. All of this focusing on each component and policy of each element involved in executing these jobs. -Then we focus on the scenario where instead of fixed oracles that provide us information from an expert formula or set of conditions, machine learning is used to create these oracles. Here we look at components and specific details while some part of the information is not known and must be learned and predicted. - We reduce the problem of optimizing resource allocations and requirements for virtualized web-services to a mathematical problem, indicating each factor, variable and element involved, also all the constraints the scheduling process must attend to. The scheduling problem can be modeled as a Mixed Integer Linear Program. Here we face an scenario of a full datacenter, further we introduce some information prediction. - We complement the model by expanding the predicted elements, studying the main resources (this is CPU, Memory and IO) that can suffer from noise, inaccuracy or unavailability. Once learning predictors for certain components let the decision making improve, the system can become more ¿expert-knowledge independent¿ and research can focus on an scenario where all the elements provide noisy, uncertainty or private information. Also we introduce to the management optimization new factors as for each datacenter context and costs may change, turning the model as "multi-datacenter" - Finally, we review of the cost of placing datacenters depending on green energy sources, and distribute the load according to green energy availability
    corecore