2,392 research outputs found

    Data-driven Models for Remaining Useful Life Estimation of Aircraft Engines and Hard Disk Drives

    Get PDF
    Failure of physical devices can cause inconvenience, loss of money, and sometimes even deaths. To improve the reliability of these devices, we need to know the remaining useful life (RUL) of a device at a given point in time. Data-driven approaches use data from a physical device to build a model that can estimate the RUL. They have shown great performance and are often simpler than traditional model-based approaches. Typical statistical and machine learning approaches are often not suited for sequential data prediction. Recurrent Neural Networks are designed to work with sequential data but suffer from the vanishing gradient problem over time. Therefore, I explore the use of Long Short-Term Memory (LSTM) networks for RUL prediction. I perform two experiments. First, I train bidirectional LSTM networks on the Backblaze hard-disk drive dataset. I achieve an accuracy of 96.4\% on a 60 day time window, state-of-the-art performance. Additionally, I use a unique standardization method that standardizes each hard drive instance independently and explore the benefits and downsides of this approach. Finally, I train LSTM models on the NASA N-CMAPSS dataset to predict aircraft engine remaining useful life. I train models on each of the eight sub-datasets, achieving a RMSE of 6.304 on one of the sub-datasets, the second-best in the current literature. I also compare an LSTM network\u27s performance to the performance of a Random Forest and Temporal Convolutional Neural Network model, demonstrating the LSTM network\u27s superior performance. I find that LSTM networks are capable predictors for device remaining useful life and show a thorough model development process that can be reproduced to develop LSTM models for various RUL prediction tasks. These models will be able to improve the reliability of devices such as aircraft engines and hard-disk drives

    Model-Augmented Estimation of Conditional Mutual Information for Feature Selection

    Full text link
    Markov blanket feature selection, while theoretically optimal, is generally challenging to implement. This is due to the shortcomings of existing approaches to conditional independence (CI) testing, which tend to struggle either with the curse of dimensionality or computational complexity. We propose a novel two-step approach which facilitates Markov blanket feature selection in high dimensions. First, neural networks are used to map features to low-dimensional representations. In the second step, CI testing is performed by applying the kk-NN conditional mutual information estimator to the learned feature maps. The mappings are designed to ensure that mapped samples both preserve information and share similar information about the target variable if and only if they are close in Euclidean distance. We show that these properties boost the performance of the kk-NN estimator in the second step. The performance of the proposed method is evaluated on both synthetic and real data.Comment: Accepted to UAI 202

    Transferência de Aprendizado para Redes Bayesianas com Aplicação em Predição de Falha de Discos Rígidos

    Get PDF
    Predizer falhas em Discos Rígidos é muito importante para evitar perda de dados e custos adicionais. Logo, um esforço pode ser observado para encontrar métodos adequados de predição de falhas. Apesar dos resultados encorajantes alcançados por vários métodos, um aspecto notado é a falta de dados disponíveis para construir modelos confiáveis. Transferência de Aprendizado oferece uma alternativa válida, uma vez que pode ser usada para transferir conhecimento de modelos de Disco com muitos dados para Discos com menos dados. Neste trabalho, avaliamos estratégias de Transferência de Aprendizado para esta tarefa. Além disso propomos uma estratégia para construir fontes de informação baseadas no agrupamento de modelos de disco parecidos. Resultados mostraram que todos os cenários testados de transferência melhoram a performance dos métodos de predição, principalmente para Discos com muito poucos dados

    Predicting Policy Violations in Policy Based Proactive Systems Management

    Get PDF
    The continuous development and advancement in networking, computing, software and web technologies have led to an explosive growth in distributed systems. To ensure better quality of service (QoS), management of large scale distributed systems is important. The increasing complexity of distributed systems requires significantly higher levels of automation in system management. The core of autonomie computing is the ability to analyze data about the distributed system and to take actions. Such autonomic management should include some ability to anticipate potential problems and take action to avoid them that is, it should be proactive. System management should be proactive in order to be able to identify possible faults before they occur and before they can result in severe degradation in performance. In this thesis, our goal is to predict policy violations and take actions ahead of time in order to achieve proactive management in a policy based system.We implemented different prediction algorithm to predict policy violations. Based on the prediction decision, proactive actions are implemented in the system. Adaptive proactive action approach is also introduced to increase the performance of the proactive management system

    Substituting Failure Avoidance for Redundancy in Storage Fault Tolerance

    Get PDF
    The primary mechanism for overcoming faults in modern storage systems is to introduce redundancy in the form of replication and error correcting codes. The costs of such redundancy in hardware, system availability and overall complexity can be substantial, depending on the number and pattern of faults that are handled. This dissertation describes and analyzes, via simulation, a system that seeks to use disk failure avoidance to reduce the need for costly redundancy by using adaptive heuristics that anticipate such failures. While a number of predictive factors can be used, this research focuses on the three leading candidates of SMART errors, age and vintage. This approach can predict where near term disk failures are more likely to occur, enabling proactive movement/replication of at-risk data, thus maintaining data integrity and availability. This strategy can reduce costs due to redundant storage without compromising these important requirements

    Data Mining Applications to Fault Diagnosis in Power Electronic Systems: A Systematic Review

    Get PDF

    Anomaly Detection and Exploratory Causal Analysis for SAP HANA

    Get PDF
    Nowadays, the good functioning of the equipment, networks and systems will be the key for the business of a company to continue operating because it is never avoidable for the companies to use information technology to support their business in the era of big data. However, the technology is never infallible, faults that give rise to sometimes critical situations may appear at any time. To detect and prevent failures, it is very essential to have a good monitoring system which is responsible for controlling the technology used by a company (hardware, networks and communications, operating systems or applications, among others) in order to analyze their operation and performance, and to detect and alert about possible errors. The aim of this thesis is thus to further advance the field of anomaly detection and exploratory causal inference which are two major research areas in a monitoring system, to provide efficient algorithms with regards to the usability, maintainability and scalability. The analyzed results can be viewed as a starting point for the root cause analysis of the system performance issues and to avoid falls in the system or minimize the time of resolution of the issues in the future. The algorithms were performed on the historical data of SAP HANA database at last and the results gained in this thesis indicate that the tools have succeeded in providing some useful information for diagnosing the performance issues of the system

    PF-OLA: A High-Performance Framework for Parallel On-Line Aggregation

    Full text link
    Online aggregation provides estimates to the final result of a computation during the actual processing. The user can stop the computation as soon as the estimate is accurate enough, typically early in the execution. This allows for the interactive data exploration of the largest datasets. In this paper we introduce the first framework for parallel online aggregation in which the estimation virtually does not incur any overhead on top of the actual execution. We define a generic interface to express any estimation model that abstracts completely the execution details. We design a novel estimator specifically targeted at parallel online aggregation. When executed by the framework over a massive 8TB8\text{TB} TPC-H instance, the estimator provides accurate confidence bounds early in the execution even when the cardinality of the final result is seven orders of magnitude smaller than the dataset size and without incurring overhead.Comment: 36 page

    A Survey on Automatic Parameter Tuning for Big Data Processing Systems

    Get PDF
    Big data processing systems (e.g., Hadoop, Spark, Storm) contain a vast number of configuration parameters controlling parallelism, I/O behavior, memory settings, and compression. Improper parameter settings can cause significant performance degradation and stability issues. However, regular users and even expert administrators grapple with understanding and tuning them to achieve good performance. We investigate existing approaches on parameter tuning for both batch and stream data processing systems and classify them into six categories: rule-based, cost modeling, simulation-based, experiment-driven, machine learning, and adaptive tuning. We summarize the pros and cons of each approach and raise some open research problems for automatic parameter tuning.Peer reviewe
    corecore