8,930 research outputs found

    Software failure prediction based on patterns of multiple-event failures

    Get PDF
    A fundamental need for software reliability engineering is to comprehend how software systems fail, which means understanding the dynamics that govern different types of failure manifestation. In this research, I present an exploratory study on multiple-event failures, which is a failure manifestation characterized by sequences of failure events, varying in terms of length, duration, and combination of failure types. This study aims to (i) improve the understanding of multiple-event failures in real software systems, investigating their occurrences, associations, and causes; (ii) propose analysis protocols that take into account multiple-event failure manifestations; (iii) take advantage of the sequential nature of this type of software failure to perform predictions. The failures analyzed in this research were observed empirically. In total, I analyzed 42,209 real software failures from 644 computers used in different workplaces. The major contributions of this study are a protocol developed to investigate the existence of patterns of failure associations; a protocol to discover patterns of failure sequences; and a prediction approach whose main concept is to calculate the probability of a certain failure event to occur within a time interval upon the occurrence of a particular pattern of preceding failures. I used three methods to tackle the prediction problem; Multinomial Logistic Regression (w/ and w/o Ridge regularization), Decision Tree, and Random Forest. These methods were chosen due to the nature of the failure data, in which the failure types must be handled as categorical variables. Initially, I performed a failure association discovery analysis which only included failures from a widely used commercial off-the-shelf Operating System (OS). As a result, I discovered 45 OS failure association patterns with 153,511 occurrences, which were composed of the same or different failure types and occurring within well-established time intervals, systematically. The observed associations suggest the existence of underlying mechanisms governing these failure occurrences, which motivated the improvement of the previous method by creating a protocol to discover patterns of failure sequences using flexible time thresholds and a failure prediction approach. To have a comprehensive view of how different software failures may affect each other, both methods were applied to three different samples — the first sample contained only OS failures, the second contained only User Application failures, and the third encompassed both OS and User Application failures altogether. As a result, I found 165, 480, and 640 different failure sequences with thousands of occurrences, respectively. Finally, the proposed approach was able to predict failures with good to high accuracy (86% to 93%).CAPES - Coordenação de Aperfeiçoamento de Pessoal de Nível SuperiorTese (Doutorado)Uma necessidade fundamental para a engenharia de confiabilidade de software é compreender como os sistemas de software falham, que significa entender a dinâmica que governa os diferentes tipos de manifestação de falha. Esta pesquisa apresenta um estudo exploratório sobre falhas de múltiplos eventos, que é uma manifestação de falha caracterizada por sequências de eventos de falha que variam em comprimento, duração e combinação de tipos de falha. Este estudo visa (i) melhorar a compreensão das falhas de múltiplos eventos em sistemas de software reais, investigando suas ocorrências, associações e causas; (ii) propor protocolos de análise que levem em consideração as manifestações de falha de múltiplos eventos; (iii) aproveitar a natureza sequencial desse tipo de falha de software para realizar previsões. As falhas analisadas nesta pesquisa foram observadas empiricamente. No total, foram analisadas 42.209 falhas reais de software de 644 computadores de diferentes locais de trabalho. As principais contribuições deste estudo são um protocolo desenvolvido para investigar a existência de padrões de associações de falha; um protocolo para descobrir padrões de sequências de falha; e uma abordagem de previsão cuja principal ideia é calcular a probabilidade de um determinado evento de falha ocorrer dentro de um intervalo de tempo após a ocorrência de um padrão particular de falhas anteriores. Três métodos foram utilizados para resolver o problema de previsão; Regressão Logística Multinomial (com ou sem regularização Ridge), Decision Tree e Random Forest. Tais métodos foram escolhidos devido à natureza dos dados de falha, nos quais os tipos de falha devem ser tratados como variáveis categóricas. Inicialmente, foi realizada uma análise de descoberta de associação de falhas que considerou apenas falhas de um sistema operacional (SO) comercial amplamente utilizado. Como resultado, foram descobertos 45 padrões de associação de falhas de sistema operacional com 153.511 ocorrências, compostos dos mesmos ou diferentes tipos de falha e ocorrendo, sistematicamente, em intervalos de tempo bem estabelecidos. As associações observadas sugerem a existência de mecanismos subjacentes que regem essas ocorrências de falha, o que motivou o aprimoramento do método anterior, com a criação de um protocolo para descobrir padrões de sequências de falhas usando limites de tempo flexíveis e uma abordagem de previsão de falha. Para ter uma visão abrangente de como as diferentes falhas de software podem afetar umas às outras, os dois métodos foram aplicados a três amostras diferentes — a primeira amostra contém apenas falhas do Sistema Operacional, a segunda contém apenas falhas de Aplicativos do Usuário e a terceira engloba falhas do Sistema Operacional e de Aplicativos de Usuário. Como resultado, foram encontradas 165, 480 e 640 sequências de falha diferentes com milhares de ocorrências, respectivamente. Por fim, a abordagem proposta foi capaz de prever falhas com boa até alta precisão (86% a 93%)

    Simple Sensitivity Analysis for Orion GNC

    Get PDF
    The performance of Orion flight software, especially its GNC software, is being analyzed by running Monte Carlo simulations of Orion spacecraft flights. The simulated performance is analyzed for conformance with flight requirements, expressed as performance constraints. Flight requirements include guidance (e.g. touchdown distance from target) and control (e.g., control saturation) as well as performance (e.g., heat load constraints). The Monte Carlo simulations disperse hundreds of simulation input variables, for everything from mass properties to date of launch.We describe in this paper a sensitivity analysis tool (Critical Factors Tool or CFT) developed to find the input variables or pairs of variables which by themselves significantly influence satisfaction of requirements or significantly affect key performance metrics (e.g., touchdown distance from target). Knowing these factors can inform robustness analysis, can inform where engineering resources are most needed, and could even affect operations. The contributions of this paper include the introduction of novel sensitivity measures, such as estimating success probability, and a technique for determining whether pairs of factors are interacting dependently or independently. The tool found that input variables such as moments, mass, thrust dispersions, and date of launch were found to be significant factors for success of various requirements. Examples are shown in this paper as well as a summary and physics discussion of EFT-1 driving factors that the tool found

    Safety Performance Prediction of Large-Truck Drivers in the Transportation Industry

    Get PDF
    The trucking industry and truck drivers play a key role in the United States commercial transportation sector. Accidents involving large trucks is one such big event that can cause huge problems to the driver, company, customer and other road users causing property damage and loss of life. The objective of this research is to concentrate on an individual transportation company and use their historical data to build models based on statistical and machine learning methods to predict accidents. The focus is to build models that has high accuracy and correctly predicts an accident. Logistic regression and penalized logistic regression models were tested initially to obtain some interpretation between the predictor variables and the response variable. Random forest, gradient boosting machine (GBM) and deep learning methods are explored to deal with high non-linear and complex data. The cost of fatal and non-fatal accidents is also discussed to weight the difference between training a driver and encountering an accident. Since accidents are very rare events, the model accuracy should be balanced between predicting non-accidents (specificity) and predicting accidents (sensitivity). This framework can be a base line for transportation companies to emphasis the benefits of prediction to have safer and more productive drivers

    L1 methods for shrinkage and correlation

    Get PDF
    This dissertation explored the idea of L1 norm in solving two statistical problems including multiple linear regression and diagnostic checking in time series. In recent years L1 shrinkage methods have become popular in linear regression as they can achieve simultaneous variable selection and parameter estimation. Their objective functions containing a least squares term and an L1 penalty term which can produce sparse solutions (Fan and Li, 2001). Least absolute shrinkage and selection operator (Lasso) was the first L1 penalized method proposed and has been widely used in practice. But the Lasso estimator has noticeable bias and is inconsistent for variable selection. Zou (2006) proposed adaptive Lasso and proved its oracle properties under some regularity conditions. We investigate the performance of adaptive Lasso by applying it to the problem of multiple undocumented change-point detection in climate. Artificial factors such as relocation of weather stations, recalibration of measurement instruments and city growth can cause abrupt mean shifts in historical temperature data. These changes do not reflect the true atmospheric evolution and unfortunately are often undocumented due to various reasons. It is imperative to locate the occurrence of these abrupt mean shifts so that raw data can be adjusted to only display the true atmosphere evolution. We have built a special linear model which accounts for long-term temperature change (global warming) by linear trend and is featured by p = n (the number of variables equals the number of observations). We apply adaptive Lasso to estimate the underlying sparse model and allow the trend parameter to be unpenalized in the objective function. Bayesian Information Criterion (BIC) and the CM criterion (Caussinus and Mestre, 2004) are used to select the finalized model. Multivariate t simultaneous confidence intervals can post-select the change-points detected by adaptive Lasso to attenuate overestimation. Considering that the oracle properties of adaptive Lasso are obtained under the condition of linear independence between predictor variables, adaptive Lasso should be used with caution since it is not uncommon for real data sets to have multicollinearity. Zou and Hastie (2005) proposed elastic net whose objective function involves both L1 and L2 penalties and claimed its superiority over Lasso in prediction. This procedure can identify a sparse model due to the L1 penalty and can tackle multicollinearity due to the L2 penalty. Although Lasso and elastic net are favored over ordinary least squares and ridge regression because of their functionality of variable selection, in presence of multicollinearity ridge regression can outperform both Lasso and elastic net in prediction. The salient point is that no regression method dominates in all cases (Fan and Li, 2001, Zou, 2006, Zou and Hastie, 2005). One major flaw of both Lasso and elastic net is the unnecessary bias brought by constraining all parameters to be penalized by the same norm. In this dissertation we propose a general and flexible framework for variable selection and estimation in linear regression. Our objective function automatically allows each parameter to be unpenalized, penalized by L1, L2 or both norms based on parameter significance and variable correlation. The resulting estimator not only can identify the correct set of significant variables with a large probability but also has smaller bias for nonzero parameters. Our procedure is a combinatorial optimization problem which can be solved by exhaustive search or genetic algorithm (as a surrogate to computation time). Aimed at a descriptive model, BIC is chosen as the model selection criterion. Another application of the L1 norm considered in this dissertation is portmanteau tests in time series. The first step in time series regression is to determine if significant serial correlation is present. If initial investigations indicate significant serial correlation, the second step is to fit an autoregressive moving average (ARMA) process to parameterize the correlation function. Portmanteau tests are commonly used to detect serial correlation or assess the goodness-of-fit of the ARMA model in these two steps. For small samples the commonly employed Ljung-Box portmanteau test (Ljung and Box, 1978) can have low power. It is beneficial to have a more powerful small sample test for detecting significant correlation. We develop such a test by considering the Cauchy estimator of correlation. While the usual sample correlation is estimated through L2 norm, the Cauchy estimator is based on L1 norm. Asymptotic properties of the test statistic are obtained. The test compares very favorably with the Box-Pierce/Ljung-Box statistics in detecting autoregressive alternatives

    A framework for AI-driven neurorehabilitation training: the profiling challenge

    Get PDF
    Cognitive decline is a common sign that a person is ageing. However, abnormal cases can lead to dementia, affecting daily living activities and independent functioning. It is a leading cause of disability and death. Its prevention is a global health priority. One way to address cognitive decline is to undergo cognitive rehabilitation. Cognitive rehabilitation aims to restore or mitigate the symptoms of a cognitive disability, increasing the quality of life for the patient. However, cognitive rehabilitation is stuck to clinical environments and logistics, leading to a suboptimal set of expansive tools that is hard to accommodate every patient’s needs. The BRaNT project aims to create a tool that mitigates this problem. The NeuroAIreh@b is a rehabilitation tool developed within a framework that combines neuropsychological assessments, neurorehabilitation procedures, artificial intelligence and game design, composing a tool that is easy to set up in a clinical environment and accessible to adapt to every patient’s needs. Among all the challenges within NeuroAlreh@b, one focuses on representing a cognitive profile through the aggregation of multiple neuropsychological assessments. To test this possibility, we will need data from patients currently unavailable. In the first part of this master’s project, study the possibility of aggregating neuropsychological assessments for the case of Alzheimer’s disease using the Alzheimer’s Disease Neuroimaging Initiative database. This database contains a vast collection of images and neuropsychological assessments that will serve as a baseline for the NeuroAlreh@b when the time comes. In the second part of this project, we set up a computational system to run all the artificial intelligence models and simulations required for the BRaNT project. The system allocates a database and a webserver to serve all the required pages for the project.O declínio cognitivo é um sinal comum de que uma pessoa está a envelhecer. No entanto, casos anormais podem levar à demência, afetando as atividades diárias e funcionamento independente. Demência é uma das principais causas de incapacidade e morte. Fazendo da sua prevenção uma prioridade para a saúde global. Uma forma de lidar com o declínio cognitivo é submeter-se à reabilitação cognitiva. A reabilitação cognitiva visa restaurar ou mitigar os sintomas de uma deficiência cognitiva, aumentando a qualidade de vida do paciente. No entanto, a reabilitação cognitiva está presa a ambientes clínicos e logística, levando a um conjunto sub-ideal de ferramentas com custos elevados e complicadas de acomodar as necessidades de cada paciente. O projeto BRaNT visa criar uma ferramenta que atenue este problema. O NeuroAIreh@b é uma ferramenta de reabilitação desenvolvida num quadro que combina avaliações neuropsicológicas, reabilitação, inteligência artificial e design de jogos, compondo uma ferramenta fácil de adaptar a um ambiente clínico e acessível para se adaptar às necessidades de cada paciente. Entre todos os desafios dentro de NeuroAlreh@b, foca-se em representar um perfil cognitivo através da agregação de múltiplas avaliações neuropsicológicas. Para testar esta possibilidade, precisaremos de dados de pacientes, que atualmente não temos. Na primeira parte do projeto deste mestrado, vamos testar a possibilidade de agregar avaliações neuropsicológicas para o caso da doença de Alzheimer utilizando a base de dados da Iniciativa de Neuroimagem da Doença de Alzheimer. Esta base de dados contém uma vasta coleção de imagens e avaliações neuropsicológicas que servirão de base para o NeuroAlreh@b quando chegar a hora. Na segunda parte deste projeto, vamos criar um sistema informático para executar todos os modelos e simulações de inteligência artificial necessários para o projeto BRaNT. O sistema também irá alocar uma base de dados e um webserver para servir todas as páginas necessárias para o projeto
    corecore