855 research outputs found

    Detection and Classification of Anomalies in Railway Tracks

    Get PDF
    Em Portugal, existe uma grande afluência dos transportes ferroviários. Acontece que as empresas que providenciam esses serviços por vezes necessitam de efetuar manutenção às vias-férreas/infraestruturas, o que leva à indisponibilização e/ou atraso dos serviços e máquinas, e consequentemente perdas monetárias. Assim sendo, torna-se necessário preparar um plano de manutenção e prever quando será fundamental efetuar manutenções, de forma a minimizar perdas. Através de um sistema de manutenção preditivo, é possível efetuar a manutenção apenas quando esta é necessária. Este tipo de sistema monitoriza continuamente máquinas e/ou processos, permitindo determinar quando a manutenção deverá existir. Uma das formas de fazer esta análise é treinar algoritmos de machine learning com uma grande quantidade de dados provenientes das máquinas e/ou processos. Nesta dissertação, o objetivo é contribuir para o desenvolvimento de um sistema de manutenção preditivo nas vias-férreas. O contributo específico será detetar e classificar anomalias. Para tal, recorrem-se a técnicas de Machine Learning e Deep Learning, mais concretamente algoritmos não supervisionados e semi-supervisionados, pois o conjunto de dados fornecido possui um número reduzido de anomalias. A escolha dos algoritmos é feita com base naquilo que atualmente é mais utilizado e apresenta melhores resultados. Assim sendo, o primeiro passo da dissertação consistiu em investigar quais as implementações mais comuns para detetar e classificar anomalias em sistemas de manutenção preditivos. Após a investigação, foram treinados os algoritmos que à primeira vista seriam capazes de se adaptar ao cenário apresentado, procurando encontrar os melhores hiperparâmetros para os mesmos. Chegou-se à conclusão, através da comparação da performance, que o mais enquadrado para abordar o problema da identificação das anomalias seria uma rede neuronal artifical Autoencoder. Através dos resultados deste modelo, foi possível definir thresholds para efetuar posteriormente a classificação da anomalia.In Portugal, the railway tracks commonly require maintenance, which leads to a stop/delay of the services, and consequently monetary losses and the non-full use of the equipment. With the use of a Predictive Maintenance System, these problems can be minimized, since these systems continuously monitor the machines and/or processes and determine when maintenance is required. Predictive Maintenance systems can be put together with machine and/or deep learning algorithms since they can be trained with high volumes of historical data and provide diagnosis, detect and classify anomalies, and estimate the lifetime of a machine/process. This dissertation contributes to developing a predictive maintenance system for railway tracks/infrastructure. The main objectives are to detect and classify anomalies in the railway track. To achieve this, unsupervised and semi-supervised algorithms are tested and tuned to determine the one that best adapts to the presented scenario. The algorithms need to be unsupervised and semi-supervised given the few anomalous labels in the dataset

    RADIS : a real-time anomaly detection intelligent system for fault diagnosis of marine machinery

    Get PDF
    By enhancing data accessibility, the implementation of data-driven models has been made possible to empower strategies in relation to O&M activities. Such models have been extensively applied to perform anomaly detection tasks, with the express purpose of detecting data patterns that deviate significantly from normal operational behaviour. Due to its preeminent importance in the maritime industry to adequately identify the behaviour of marine systems, the Real-time Anomaly Detection Intelligent System (RADIS) framework, constituted by a Long Short-Term Memory-based Variational Autoencoder in tandem with multi-level Otsu's thresholding, is proposed. RADIS aims to address the current gaps identified within the maritime industry in relation to data-driven model applications for enabling smart maintenance. To assess the performance of such a framework, a case study on a total of 14 parameters obtained from sensors installed on a diesel generator of a tanker ship is introduced to highlight the implementation of RADIS. Results demonstrated the capability of RADIS to be part of a diagnostic analytics tool that will promote the implementation of smart maintenance within the maritime industry, as RADIS detected an average of 92.5% of anomalous instances in the presented case study

    Degradation stage classification via interpretable feature learning

    Get PDF
    Predictive maintenance (PdM) advocates for the usage of machine learning technologies to monitor asset's health conditions and plan maintenance activities accordingly. However, according to the specific degradation process, some health-related measures (e.g. temperature) may be not informative enough to reliably assess the health stage. Moreover, each measure needs to be properly treated to extract the information linked to the health stage. Those issues are usually addressed by performing a manual feature engineering, which results in high management cost and poor generalization capability of those approaches. In this work, we address this issue by coupling a health stage classifier with a feature learning mechanism. With feature learning, minimally processed data are automatically transformed into informative features. Many effective feature learning approaches are based on deep learning. With those, the features are obtained as a non-linear combination of the inputs, thus it is difficult to understand the input's contribution to the classification outcome and so the reasoning behind the model. Still, these insights are increasingly required to interpret the results and assess the reliability of the model. In this regard, we propose a feature learning approach able to (i) effectively extract high-quality features by processing different input signals, and (ii) provide useful insights about the most informative domain transformations (e.g. Fourier transform or probability density function) of the input signals (e.g. vibration or temperature). The effectiveness of the proposed approach is tested with publicly available real-world datasets about bearings' progressive deterioration and compared with the traditional feature engineering approach

    Spatiotemporal anomaly detection: streaming architecture and algorithms

    Get PDF
    Includes bibliographical references.2020 Summer.Anomaly detection is the science of identifying one or more rare or unexplainable samples or events in a dataset or data stream. The field of anomaly detection has been extensively studied by mathematicians, statisticians, economists, engineers, and computer scientists. One open research question remains the design of distributed cloud-based architectures and algorithms that can accurately identify anomalies in previously unseen, unlabeled streaming, multivariate spatiotemporal data. With streaming data, time is of the essence, and insights are perishable. Real-world streaming spatiotemporal data originate from many sources, including mobile phones, supervisory control and data acquisition enabled (SCADA) devices, the internet-of-things (IoT), distributed sensor networks, and social media. Baseline experiments are performed on four (4) non-streaming, static anomaly detection multivariate datasets using unsupervised offline traditional machine learning (TML), and unsupervised neural network techniques. Multiple architectures, including autoencoders, generative adversarial networks, convolutional networks, and recurrent networks, are adapted for experimentation. Extensive experimentation demonstrates that neural networks produce superior detection accuracy over TML techniques. These same neural network architectures can be extended to process unlabeled spatiotemporal streaming using online learning. Space and time relationships are further exploited to provide additional insights and increased anomaly detection accuracy. A novel domain-independent architecture and set of algorithms called the Spatiotemporal Anomaly Detection Environment (STADE) is formulated. STADE is based on federated learning architecture. STADE streaming algorithms are based on a geographically unique, persistently executing neural networks using online stochastic gradient descent (SGD). STADE is designed to be pluggable, meaning that alternative algorithms may be substituted or combined to form an ensemble. STADE incorporates a Stream Anomaly Detector (SAD) and a Federated Anomaly Detector (FAD). The SAD executes at multiple locations on streaming data, while the FAD executes at a single server and identifies global patterns and relationships among the site anomalies. Each STADE site streams anomaly scores to the centralized FAD server for further spatiotemporal dependency analysis and logging. The FAD is based on recent advances in DNN-based federated learning. A STADE testbed is implemented to facilitate globally distributed experimentation using low-cost, commercial cloud infrastructure provided by Microsoftâ„¢. STADE testbed sites are situated in the cloud within each continent: Africa, Asia, Australia, Europe, North America, and South America. Communication occurs over the commercial internet. Three STADE case studies are investigated. The first case study processes commercial air traffic flows, the second case study processes global earthquake measurements, and the third case study processes social media (i.e., Twitterâ„¢) feeds. These case studies confirm that STADE is a viable architecture for the near real-time identification of anomalies in streaming data originating from (possibly) computationally disadvantaged, geographically dispersed sites. Moreover, the addition of the FAD provides enhanced anomaly detection capability. Since STADE is domain-independent, these findings can be easily extended to additional application domains and use cases

    A real-time data-driven framework for the identification of steady states of marine machinery

    Get PDF
    While maritime transportation is the primary means of long-haul transportation of goods to and from the EU, it continues to present a significant number of casualties and fatalities owing to damage to ship equipment; damage attributed to machinery failures during daily ship operations. Therefore, the implementation of state-of-the-art inspection and maintenance activities are of paramount importance to adequately ensure the proper functioning of systems. Accordingly, Internet of Ships paradigm has emerged to guarantee the interconnectivity of maritime objects. Such technology is still in its infancy, and thus several challenges need to be addressed. An example of which is data preparation, critical to ensure data quality while avoiding biased results in further analysis to enhance transportation operations. As part of developing a real-time intelligent system to assist with instant decision-making strategies that enhance ship and systems availability, operability, and profitability, a data-driven framework for the identification of steady states of marine machinery based on image generation and connected component analysis is proposed. The identification of such states is of preeminent importance, as non-operational states may adversely alter the results outlined. A case study of three diesel generators of a tanker ship is introduced to validate the developed framework. Results of this study demonstrated the outperformance of the proposed model in relation to the widely implemented clustering models k-means and GMMs with EM algorithm. As such, the proposed framework can adequately identify steady states appropriately to guarantee the detection of such states in real-time, whilst ensuring computational efficiency and model effectiveness

    Anomaly detection and explanation in big data

    Get PDF
    2021 Spring.Includes bibliographical references.Data quality tests are used to validate the data stored in databases and data warehouses, and to detect violations of syntactic and semantic constraints. Domain experts grapple with the issues related to the capturing of all the important constraints and checking that they are satisfied. The constraints are often identified in an ad hoc manner based on the knowledge of the application domain and the needs of the stakeholders. Constraints can exist over single or multiple attributes as well as records involving time series and sequences. The constraints involving multiple attributes can involve both linear and non-linear relationships among the attributes. We propose ADQuaTe as a data quality test framework that automatically (1) discovers different types of constraints from the data, (2) marks records that violate the constraints as suspicious, and (3) explains the violations. Domain knowledge is required to determine whether or not the suspicious records are actually faulty. The framework can incorporate feedback from domain experts to improve the accuracy of constraint discovery and anomaly detection. We instantiate ADQuaTe in two ways to detect anomalies in non-sequence and sequence data. The first instantiation (ADQuaTe2) uses an unsupervised approach called autoencoder for constraint discovery in non-sequence data. ADQuaTe2 is based on analyzing records in isolation to discover constraints among the attributes. We evaluate the effectiveness of ADQuaTe2 using real-world non-sequence datasets from the human health and plant diagnosis domains. We demonstrate that ADQuaTe2 can discover new constraints that were previously unspecified in existing data quality tests, and can report both previously detected and new faults in the data. We also use non-sequence datasets from the UCI repository to evaluate the improvement in the accuracy of ADQuaTe2 after incorporating ground truth knowledge and retraining the autoencoder model. The second instantiation (IDEAL) uses an unsupervised LSTM-autoencoder for constraint discovery in sequence data. IDEAL analyzes the correlations and dependencies among data records to discover constraints. We evaluate the effectiveness of IDEAL using datasets from Yahoo servers, NASA Shuttle, and Colorado State University Energy Institute. We demonstrate that IDEAL can detect previously known anomalies from these datasets. Using mutation analysis, we show that IDEAL can detect different types of injected faults. We also demonstrate that the accuracy of the approach improves after incorporating ground truth knowledge about the injected faults and retraining the LSTM-Autoencoder model. The novelty of this research lies in the development of a domain-independent framework that effectively and efficiently discovers different types of constraints from the data, detects and explains anomalous data, and minimizes false alarms through an interactive learning process

    Machine learning for the sustainable energy transition: a data-driven perspective along the value chain from manufacturing to energy conversion

    Get PDF
    According to the special report Global Warming of 1.5 °C of the IPCC, climate action is not only necessary but more than ever urgent. The world is witnessing rising sea levels, heat waves, events of flooding, droughts, and desertification resulting in the loss of lives and damage to livelihoods, especially in countries of the Global South. To mitigate climate change and commit to the Paris agreement, it is of the uttermost importance to reduce greenhouse gas emissions coming from the most emitting sector, namely the energy sector. To this end, large-scale penetration of renewable energy systems into the energy market is crucial for the energy transition toward a sustainable future by replacing fossil fuels and improving access to energy with socio-economic benefits. With the advent of Industry 4.0, Internet of Things technologies have been increasingly applied to the energy sector introducing the concept of smart grid or, more in general, Internet of Energy. These paradigms are steering the energy sector towards more efficient, reliable, flexible, resilient, safe, and sustainable solutions with huge environmental and social potential benefits. To realize these concepts, new information technologies are required, and among the most promising possibilities are Artificial Intelligence and Machine Learning which in many countries have already revolutionized the energy industry. This thesis presents different Machine Learning algorithms and methods for the implementation of new strategies to make renewable energy systems more efficient and reliable. It presents various learning algorithms, highlighting their advantages and limits, and evaluating their application for different tasks in the energy context. In addition, different techniques are presented for the preprocessing and cleaning of time series, nowadays collected by sensor networks mounted on every renewable energy system. With the possibility to install large numbers of sensors that collect vast amounts of time series, it is vital to detect and remove irrelevant, redundant, or noisy features, and alleviate the curse of dimensionality, thus improving the interpretability of predictive models, speeding up their learning process, and enhancing their generalization properties. Therefore, this thesis discussed the importance of dimensionality reduction in sensor networks mounted on renewable energy systems and, to this end, presents two novel unsupervised algorithms. The first approach maps time series in the network domain through visibility graphs and uses a community detection algorithm to identify clusters of similar time series and select representative parameters. This method can group both homogeneous and heterogeneous physical parameters, even when related to different functional areas of a system. The second approach proposes the Combined Predictive Power Score, a method for feature selection with a multivariate formulation that explores multiple sub-sets of expanding variables and identifies the combination of features with the highest predictive power over specified target variables. This method proposes a selection algorithm for the optimal combination of variables that converges to the smallest set of predictors with the highest predictive power. Once the combination of variables is identified, the most relevant parameters in a sensor network can be selected to perform dimensionality reduction. Data-driven methods open the possibility to support strategic decision-making, resulting in a reduction of Operation & Maintenance costs, machine faults, repair stops, and spare parts inventory size. Therefore, this thesis presents two approaches in the context of predictive maintenance to improve the lifetime and efficiency of the equipment, based on anomaly detection algorithms. The first approach proposes an anomaly detection model based on Principal Component Analysis that is robust to false alarms, can isolate anomalous conditions, and can anticipate equipment failures. The second approach has at its core a neural architecture, namely a Graph Convolutional Autoencoder, which models the sensor network as a dynamical functional graph by simultaneously considering the information content of individual sensor measurements (graph node features) and the nonlinear correlations existing between all pairs of sensors (graph edges). The proposed neural architecture can capture hidden anomalies even when the turbine continues to deliver the power requested by the grid and can anticipate equipment failures. Since the model is unsupervised and completely data-driven, this approach can be applied to any wind turbine equipped with a SCADA system. When it comes to renewable energies, the unschedulable uncertainty due to their intermittent nature represents an obstacle to the reliability and stability of energy grids, especially when dealing with large-scale integration. Nevertheless, these challenges can be alleviated if the natural sources or the power output of renewable energy systems can be forecasted accurately, allowing power system operators to plan optimal power management strategies to balance the dispatch between intermittent power generations and the load demand. To this end, this thesis proposes a multi-modal spatio-temporal neural network for multi-horizon wind power forecasting. In particular, the model combines high-resolution Numerical Weather Prediction forecast maps with turbine-level SCADA data and explores how meteorological variables on different spatial scales together with the turbines' internal operating conditions impact wind power forecasts. The world is undergoing a third energy transition with the main goal to tackle global climate change through decarbonization of the energy supply and consumption patterns. This is not only possible thanks to global cooperation and agreements between parties, power generation systems advancements, and Internet of Things and Artificial Intelligence technologies but also necessary to prevent the severe and irreversible consequences of climate change that are threatening life on the planet as we know it. This thesis is intended as a reference for researchers that want to contribute to the sustainable energy transition and are approaching the field of Artificial Intelligence in the context of renewable energy systems

    End-to-end anomaly detection in stream data

    Get PDF
    Nowadays, huge volumes of data are generated with increasing velocity through various systems, applications, and activities. This increases the demand for stream and time series analysis to react to changing conditions in real-time for enhanced efficiency and quality of service delivery as well as upgraded safety and security in private and public sectors. Despite its very rich history, time series anomaly detection is still one of the vital topics in machine learning research and is receiving increasing attention. Identifying hidden patterns and selecting an appropriate model that fits the observed data well and also carries over to unobserved data is not a trivial task. Due to the increasing diversity of data sources and associated stochastic processes, this pivotal data analysis topic is loaded with various challenges like complex latent patterns, concept drift, and overfitting that may mislead the model and cause a high false alarm rate. Handling these challenges leads the advanced anomaly detection methods to develop sophisticated decision logic, which turns them into mysterious and inexplicable black-boxes. Contrary to this trend, end-users expect transparency and verifiability to trust a model and the outcomes it produces. Also, pointing the users to the most anomalous/malicious areas of time series and causal features could save them time, energy, and money. For the mentioned reasons, this thesis is addressing the crucial challenges in an end-to-end pipeline of stream-based anomaly detection through the three essential phases of behavior prediction, inference, and interpretation. The first step is focused on devising a time series model that leads to high average accuracy as well as small error deviation. On this basis, we propose higher-quality anomaly detection and scoring techniques that utilize the related contexts to reclassify the observations and post-pruning the unjustified events. Last but not least, we make the predictive process transparent and verifiable by providing meaningful reasoning behind its generated results based on the understandable concepts by a human. The provided insight can pinpoint the anomalous regions of time series and explain why the current status of a system has been flagged as anomalous. Stream-based anomaly detection research is a principal area of innovation to support our economy, security, and even the safety and health of societies worldwide. We believe our proposed analysis techniques can contribute to building a situational awareness platform and open new perspectives in a variety of domains like cybersecurity, and health

    Using supervised and one-class automated machine learning for predictive maintenance

    Get PDF
    Predictive Maintenance (PdM) is a critical area that is benefiting from the Industry 4.0 advent. Recently, several attempts have been made to apply Machine Learning (ML) to PdM, with the majority of the research studies assuming an expert-based ML modeling. In contrast with these works, this paper explores a purely Automated Machine Learning (AutoML) modeling for PdM under two main approaches. Firstly, we adapt and compare ten recent open-source AutoML technologies focused on a Supervised Learning. Secondly, we propose a novel AutoML approach focused on a One-Class (OC) Learning (AutoOneClass) that employs a Grammatical Evolution (GE) to search for the best PdM model using three types of learners (OC Support Vector Machines, Isolation Forests and deep Autoencoders). Using recently collected data from a Portuguese software company client, we performed a benchmark comparison study with the Supervised AutoML tools and the proposed AutoOneClass method to predict the number of days until the next failure of an equipment and also determine if the equipments will fail in a fixed amount of days. Overall, the results were close among the compared AutoML tools, with supervised AutoGluon obtaining the best results for all ML tasks. Moreover, the best supervised AutoML and AutoOneClass predictive results were compared with two manual ML modeling approaches (using a ML expert and a non-ML expert), revealing competitive results.This work was executed under the project Cognitive CMMS - Cognitive Computerized Maintenance Management System, NUP: POCI-01-0247-FEDER-033574, co-funded by the Incentive System for Research and Technological Development , from the Thematic Operational Program Competitiveness of the national framework program - Portugal2020. We wish to thank the anonymous reviewers for their helpful comments
    • …
    corecore