403 research outputs found

    Analysis of Software Aging in a Web Server

    Get PDF
    A number of recent studies have reported the phenomenon of “software aging”, characterized by progressive performance degradation and/or an increased occurrence rate of hang/crash failures of a software system due to the exhaustion of operating system resources or the accumulation of errors. To counteract this phenomenon, a proactive technique called 'software rejuvenation' has been proposed. It essentially involves stopping the running software, cleaning its internal state and/or its environment and then restarting it. Software rejuvenation, being preventive in nature, begs the question as to when to schedule it. Periodic rejuvenation, while straightforward to implement, may not yield the best results, because the rate at which software ages is not constant, but it depends on the time-varying system workload. Software rejuvenation should therefore be planned and initiated in the face of the actual system behavior. This requires the measurement, analysis and prediction of system resource usage. In this paper, we study the development of resource usage in a web server while subjecting it to an artificial workload. We first collect data on several system resource usage and activity parameters. Non-parametric statistical methods are then applied for detecting and estimating trends in the data sets. Finally, we fit time series models to the data collected. Unlike the models used previously in the research on software aging, these time series models allow for seasonal patterns, and we show how the exploitation of the seasonal variation can help in adequately predicting the future resource usage. Based on the models employed here, proactive management techniques like software rejuvenation triggered by actual measurements can be built. --Software aging,software rejuvenation,Linux,Apache,web server,performance monitoring,prediction of resource utilization,non-parametric trend analysis,time series analysis

    Discrete-time cost analysis for a telecommunication billing application with rejuvenation

    Get PDF
    AbstractSoftware rejuvenation is a proactive fault management technique that has been extensively studied in the recent literature. In this paper, we focus on an example for a telecommunication billing application considered in [1] and develop the discrete-time stochastic models to estimate the optimal software rejuvenation schedules. More precisely, two software cost models with rejuvenation are formulated via the discrete semi-Markov processes, and the optimal software rejuvenation schedules which minimize the expected costs per unit time in the steady state are derived analytically. Further, we develop statistically nonparametric algorithms to estimate the optimal software rejuvenation schedules, provided that the complete sample data of failure times are given. Then, a new statistical device, called discrete total time on test statistics, is introduced. Finally, we examine asymptotic properties for the statistical estimation algorithms proposed in this paper through a simulation experiment

    Near-optimal scheduling and decision-making models for reactive and proactive fault tolerance mechanisms

    Get PDF
    As High Performance Computing (HPC) systems increase in size to fulfill computational power demand, the chance of failure occurrences dramatically increases, resulting in potentially large amounts of lost computing time. Fault Tolerance (FT) mechanisms aim to mitigate the impact of failure occurrences to the running applications. However, the overhead of FT mechanisms increases proportionally to the HPC systems\u27 size. Therefore, challenges arise in handling the expensive overhead of FT mechanisms while minimizing the large amount of lost computing time due to failure occurrences. In this dissertation, a near-optimal scheduling model is built to determine when to invoke a hybrid checkpoint mechanism, by means of stochastic processes and calculus of variations. The obtained schedule minimizes the waste time caused by checkpoint mechanism and failure occurrences. Generally, the checkpoint/restart mechanisms periodically save application states and load the saved state, upon failure occurrences. Furthermore, to handle various FT mechanisms, an adaptive decision-making model has been developed to determine the best FT strategy to invoke at each decision point. The best mechanism at each decision point is selected among considered FT mechanisms to globally minimize the total waste time for an application execution by means of a dynamic programming approach. In addition, the model is adaptive to deal with changes in failure rate over time

    Towards Bayesian System Identification: With Application to SHM of Offshore Structures

    Get PDF
    Within the offshore industry Structural Health Monitoring remains a growing area of interest. The oil and gas sectors are faced with ageing infrastructure and are driven by the desire for reliable lifetime extension, whereas the wind energy sector is investing heavily in a large number of structures. This leads to a number of distinct challenges for Structural Health Monitoring which are brought together by one unifying theme --- uncertainty. The offshore environment is highly uncertain, existing structures have not been monitored from construction and the loading and operational conditions they have experienced (among other factors) are not known. For the wind energy sector, high numbers of structures make traditional inspection methods costly and in some cases dangerous due to the inaccessibility of many wind farms. Structural Health Monitoring attempts to address these issues by providing tools to allow automated online assessment of the condition of structures to aid decision making. The work of this thesis presents a number of Bayesian methods which allow system identification, for Structural Health Monitoring, under uncertainty. The Bayesian approach explicitly incorporates prior knowledge that is available and combines this with evidence from observed data to allow the formation of updated beliefs. This is a natural way to approach Structural Health Monitoring, or indeed, many engineering problems. It is reasonable to assume that there is some knowledge available to the engineer before attempting to detect, locate, classify, or model damage on a structure. Having a framework where this knowledge can be exploited, and the uncertainty in that knowledge can be handled rigorously, is a powerful methodology. The problem being that the actual computation of Bayesian results can pose a significant challenge both computationally and in terms of specifying appropriate models. This thesis aims to present a number of Bayesian tools, each of which leverages the power of the Bayesian paradigm to address a different Structural Health Monitoring challenge. Within this work the use of Gaussian Process models is presented as a flexible nonparametric Bayesian approach to regression, which is extended to handle dynamic models within the Gaussian Process NARX framework. The challenge in training Gaussian Process models is seldom discussed and the work shown here aims to offer a quantitative assessment of different learning techniques including discussions on the choice of cost function for optimisation of hyperparameters and the choice of the optimisation algorithm itself. Although rarely considered, the effects of these choices are demonstrated to be important and to inform the use of a Gaussian Process NARX model for wave load identification on offshore structures. The work is not restricted to only Gaussian Process models, but Bayesian state-space models are also used. The novel use of Particle Gibbs for identification of nonlinear oscillators is shown and modifications to this algorithm are applied to handle its specific use in Structural Health Monitoring. Alongside this, the Bayesian state-space model is used to perform joint input-state-parameter inference for Operational Modal Analysis where the use of priors over the parameters and the forcing function (in the form of a Gaussian Process transformed into a state-space representation) provides a methodology for this output-only identification under parameter uncertainty. Interestingly, this method is shown to recover the parameter distributions of the model without compromising the recovery of the loading time-series signal when compared to the case where the parameters are known. Finally, a novel use of an online Bayesian clustering method is presented for performing Structural Health Monitoring in the absence of any available training data. This online method does not require a pre-collected training dataset, nor a model of the structure, and is capable of detecting and classifying a range of operational and damage conditions while in service. This leaves the reader with a toolbox of methods which can be applied, where appropriate, to identification of dynamic systems with a view to Structural Health Monitoring problems within the offshore industry and across engineering

    Envelhecimento e rejuvenescimento de software: 20 anos (19952014) - panorama e desafios

    Get PDF
    Although software aging and rejuvenation is a young research held, in its first 20 years a lot of knowledge has been produced. Nowadays, important scientific journals and conferences include SAR-related topics in their scope of interest. This fast growing and wide range of dissemination venues pose a challenge to researchers to keep tracking of the new findings and trends in this area. In this work, we collected and analyzed SAR research data to detect trends, patterns, and thematic gaps, in order to provide a comprehensive view of this research held over its hrst 20 years. Adopted the systematic mapping approach to answer research questions such as: How the main topics investigated in SAR have evolved over time? Which are the most investigated aging effects? Which rejuvenation techniques and strategies are more frequently used?CAPES - Coordenação de Aperfeiçoamento de Pessoal de Nível SuperiorDissertação (Mestrado)Embora o envelhecimento e rejuvenescimento de software seja um campo de pesquisa novo, em seus primeiros 20 anos muito conhecimento foi produzido. Hoje em dia, revistas e conferências científicas importantes incluem temas relacionados a SAR no seu âmbito de interesse. Este crescimento rápido e a grande variedade de locais de disseminação representam um desafio para os pesquisadores para manter o acompanhamento das novas descobertas e tendências nesta área. Neste trabalho, foram coletados e analisados dados de pesquisa em SAR para detectar tendências, padrões e lacunas temáticas, a hm de proporcionar uma visão abrangente deste campo de pesquisa em seus primeiros 20 anos. Adotou-se a abordagem de mapeamento sistemático para responder a perguntas de pesquisa, tais como: Como os principais temas investigados em SAR têm evoluído ao longo do tempo? Quais são os efeitos do envelhecimento mais investigados? Quais técnicas e estratégias de rejuvenescimento são mais frequentemente usadas

    Improved self-management of datacenter systems applying machine learning

    Get PDF
    Autonomic Computing is a Computer Science and Technologies research area, originated during mid 2000's. It focuses on optimization and improvement of complex distributed computing systems through self-control and self-management. As distributed computing systems grow in complexity, like multi-datacenter systems in cloud computing, the system operators and architects need more help to understand, design and optimize manually these systems, even more when these systems are distributed along the world and belong to different entities and authorities. Self-management lets these distributed computing systems improve their resource and energy management, a very important issue when resources have a cost, by obtaining, running or maintaining them. Here we propose to improve Autonomic Computing techniques for resource management by applying modeling and prediction methods from Machine Learning and Artificial Intelligence. Machine Learning methods can find accurate models from system behaviors and often intelligible explanations to them, also predict and infer system states and values. These models obtained from automatic learning have the advantage of being easily updated to workload or configuration changes by re-taking examples and re-training the predictors. So employing automatic modeling and predictive abilities, we can find new methods for making "intelligent" decisions and discovering new information and knowledge from systems. This thesis departs from the state of the art, where management is based on administrators expertise, well known data, ad-hoc studied algorithms and models, and elements to be studied from computing machine point of view; to a novel state of the art where management is driven by models learned from the same system, providing useful feedback, making up for incomplete, missing or uncertain data, from a global network of datacenters point of view. - First of all, we cover the scenario where the decision maker works knowing all pieces of information from the system: how much will each job consume, how is and will be the desired quality of service, what are the deadlines for the workload, etc. All of this focusing on each component and policy of each element involved in executing these jobs. -Then we focus on the scenario where instead of fixed oracles that provide us information from an expert formula or set of conditions, machine learning is used to create these oracles. Here we look at components and specific details while some part of the information is not known and must be learned and predicted. - We reduce the problem of optimizing resource allocations and requirements for virtualized web-services to a mathematical problem, indicating each factor, variable and element involved, also all the constraints the scheduling process must attend to. The scheduling problem can be modeled as a Mixed Integer Linear Program. Here we face an scenario of a full datacenter, further we introduce some information prediction. - We complement the model by expanding the predicted elements, studying the main resources (this is CPU, Memory and IO) that can suffer from noise, inaccuracy or unavailability. Once learning predictors for certain components let the decision making improve, the system can become more ¿expert-knowledge independent¿ and research can focus on an scenario where all the elements provide noisy, uncertainty or private information. Also we introduce to the management optimization new factors as for each datacenter context and costs may change, turning the model as "multi-datacenter" - Finally, we review of the cost of placing datacenters depending on green energy sources, and distribute the load according to green energy availability
    corecore