1,593 research outputs found

    Classification under Streaming Emerging New Classes: A Solution using Completely Random Trees

    Get PDF
    This paper investigates an important problem in stream mining, i.e., classification under streaming emerging new classes or SENC. The common approach is to treat it as a classification problem and solve it using either a supervised learner or a semi-supervised learner. We propose an alternative approach by using unsupervised learning as the basis to solve this problem. The SENC problem can be decomposed into three sub problems: detecting emerging new classes, classifying for known classes, and updating models to enable classification of instances of the new class and detection of more emerging new classes. The proposed method employs completely random trees which have been shown to work well in unsupervised learning and supervised learning independently in the literature. This is the first time, as far as we know, that completely random trees are used as a single common core to solve all three sub problems: unsupervised learning, supervised learning and model update in data streams. We show that the proposed unsupervised-learning-focused method often achieves significantly better outcomes than existing classification-focused methods

    Oil and Gas flow Anomaly Detection on offshore naturally flowing wells using Deep Neural Networks

    Get PDF
    Dissertation presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced Analytics, specialization in Data ScienceThe Oil and Gas industry, as never before, faces multiple challenges. It is being impugned for being dirty, a pollutant, and hence the more demand for green alternatives. Nevertheless, the world still has to rely heavily on hydrocarbons, since it is the most traditional and stable source of energy, as opposed to extensively promoted hydro, solar or wind power. Major operators are challenged to produce the oil more efficiently, to counteract the newly arising energy sources, with less of a climate footprint, more scrutinized expenditure, thus facing high skepticism regarding its future. It has to become greener, and hence to act in a manner not required previously. While most of the tools used by the Hydrocarbon E&P industry is expensive and has been used for many years, it is paramount for the industry’s survival and prosperity to apply predictive maintenance technologies, that would foresee potential failures, making production safer, lowering downtime, increasing productivity and diminishing maintenance costs. Many efforts were applied in order to define the most accurate and effective predictive methods, however data scarcity affects the speed and capacity for further experimentations. Whilst it would be highly beneficial for the industry to invest in Artificial Intelligence, this research aims at exploring, in depth, the subject of Anomaly Detection, using the open public data from Petrobras, that was developed by experts. For this research the Deep Learning Neural Networks, such as Recurrent Neural Networks with LSTM and GRU backbones, were implemented for multi-class classification of undesirable events on naturally flowing wells. Further, several hyperparameter optimization tools were explored, mainly focusing on Genetic Algorithms as being the most advanced methods for such kind of tasks. The research concluded with the best performing algorithm with 2 stacked GRU and the following vector of hyperparameters weights: [1, 47, 40, 14], which stand for timestep 1, number of hidden units 47, number of epochs 40 and batch size 14, producing F1 equal to 0.97%. As the world faces many issues, one of which is the detrimental effect of heavy industries to the environment and as result adverse global climate change, this project is an attempt to contribute to the field of applying Artificial Intelligence in the Oil and Gas industry, with the intention to make it more efficient, transparent and sustainable

    Root Cause Analysis Of Productivity Losses In Manufacturing Systems Utilizing Ensemble Machine Learning

    Get PDF
    In today’s rapidly evolving landscape of automation and manufacturing systems, the efficient resolution of productivity losses is paramount. This study introduces a data-driven ensemble approach, utilizing the cyclic multivariate time series data from binary sensors and signals from Programmable Logic Controllers (PLCs) within these systems. The objective is to automatically analyze productivity losses per cycle and pinpoint their root causes by assigning the loss to a system element. The ensemble approach introduced in this publication integrates various methods, including information theory and machine learning behavior models, to provide a robust analysis for each production cycle. To expedite the resolution of productivity losses and ensure short response times, stream processing becomes a necessity. Addressing this, the approach is implemented as data-stream analysis and can be transferred to batch processing, seamlessly integrating into existing systems without the need for extensive historical data analysis. This method has two positive effects. Firstly, the result of the analysis ensures that the period of lower productivity is reduced by identifying the likely root cause of the productivity loss. Secondly, these results are more reliable due to the ensemble approach and therefore avoid dependency on technical experts. The approach is validated using a semi-automated welding manufacturing system, an injection molding automation system, and a synthetically generated test PLC dataset. The results demonstrate the method's efficacy in offering a data-driven understanding of process behavior and mark an advancement in autonomous manufacturing system analysis

    Fault Diagnosis and Failure Prognostics of Lithium-ion Battery based on Least Squares Support Vector Machine and Memory Particle Filter Framework

    Get PDF
    123456A novel data driven approach is developed for fault diagnosis and remaining useful life (RUL) prognostics for lithium-ion batteries using Least Square Support Vector Machine (LS-SVM) and Memory-Particle Filter (M-PF). Unlike traditional data-driven models for capacity fault diagnosis and failure prognosis, which require multidimensional physical characteristics, the proposed algorithm uses only two variables: Energy Efficiency (EE), and Work Temperature. The aim of this novel framework is to improve the accuracy of incipient and abrupt faults diagnosis and failure prognosis. First, the LSSVM is used to generate residual signal based on capacity fade trends of the Li-ion batteries. Second, adaptive threshold model is developed based on several factors including input, output model error, disturbance, and drift parameter. The adaptive threshold is used to tackle the shortcoming of a fixed threshold. Third, the M-PF is proposed as the new method for failure prognostic to determine Remaining Useful Life (RUL). The M-PF is based on the assumption of the availability of real-time observation and historical data, where the historical failure data can be used instead of the physical failure model within the particle filter. The feasibility of the framework is validated using Li-ion battery prognostic data obtained from the National Aeronautic and Space Administration (NASA) Ames Prognostic Center of Excellence (PCoE). The experimental results show the following: (1) fewer data dimensions for the input data are required compared to traditional empirical models; (2) the proposed diagnostic approach provides an effective way of diagnosing Li-ion battery fault; (3) the proposed prognostic approach can predict the RUL of Li-ion batteries with small error, and has high prediction accuracy; and, (4) the proposed prognostic approach shows that historical failure data can be used instead of a physical failure model in the particle filter

    Fault detection and identification methodology under an incremental learning framework applied to industrial machinery

    Get PDF
    An industrial machinery condition monitoring methodology based on ensemble novelty detection and evolving classification is proposed in this study. The methodology contributes to solve current challenges dealing with classical electromechanical system monitoring approaches applied in industrial frameworks, that is, the presence of unknown events, the limitation to the nominal healthy condition as starting knowledge, and the incorporation of new patterns to the available knowledge. The proposed methodology is divided into four main stages: 1) a dedicated feature calculation and reduction over available physical magnitudes to increase novelty detection and fault classification capabilities; 2) a novelty detection based on the ensemble of one-class support vector machines to identify not previously considered events; 3) a diagnosis by means of eClass evolving classifiers for patterns recognition; and 4) re-training to include new patterns to the novelty detection and fault identification models. The effectiveness of the proposed fault detection and identification methodology has been compared with classical approaches, and verified by experimental results obtained from an automotive end-of-line test machine.This work was supported in part by the Generalitat de Catalunya (GRC MCIA) under Grant nâ—¦ SGR 2014-101, in part by the Spanish Ministry of Economy and Competitiveness under Project TRA2016-80472-R Research, and in part by the CONACyT Scholarship under Grant 313604

    Anomaly Intrusion Detection based on Concept Drift

    Get PDF
    Nowadays, security on the internet is a vital issue and therefore, intrusion detection is one of the major research problems for networks that defend external attacks. Intrusion detection is a new approach for providing security in existing computers and data networks. An Intrusion Detection System is a software application that monitors the system for malicious activities and unauthorized access to the system. An easy accessibility condition causes computer networks vulnerable against the attack and several threats from attackers. Intrusion Detection System is used to analyze a network of interconnected systems for avoiding uncommon intrusion or chaos. The intrusion detection problem is becoming a challenging task due to the increase in computer networks since the increased connectivity of computer systems gives access to all and makes it easier for hackers to avoid their traces and identification. The goal of intrusion detection is to identify unauthorized use, misuse and abuse of computer systems. This project focuses on algorithms: (i) Concept Drift based ensemble Incremental Learning approach for anomaly intrusion detection, and (ii) Diversity and Transfer-based Ensemble Learning. These are highly ranked anomaly detection models. We study and compare both learning models. The Network Security Laboratory-Knowledge Discovery and Data Mining (NSL-KDD99) dataset have been used for training and to detect the misuse activities

    Rejection-oriented learning without complete class information

    Get PDF
    Machine Learning is commonly used to support decision-making in numerous, diverse contexts. Its usefulness in this regard is unquestionable: there are complex systems built on the top of machine learning techniques whose descriptive and predictive capabilities go far beyond those of human beings. However, these systems still have limitations, whose analysis enable to estimate their applicability and confidence in various cases. This is interesting considering that abstention from the provision of a response is preferable to make a mistake in doing so. In the context of classification-like tasks, the indication of such inconclusive output is called rejection. The research which culminated in this thesis led to the conception, implementation and evaluation of rejection-oriented learning systems for two distinct tasks: open set recognition and data stream clustering. These system were derived from WiSARD artificial neural network, which had rejection modelling incorporated into its functioning. This text details and discuss such realizations. It also presents experimental results which allow assess the scientific and practical importance of the proposed state-of-the-art methodology.Aprendizado de Máquina é comumente usado para apoiar a tomada de decisão em numerosos e diversos contextos. Sua utilidade neste sentido é inquestionável: existem sistemas complexos baseados em técnicas de aprendizado de máquina cujas capacidades descritivas e preditivas vão muito além das dos seres humanos. Contudo, esses sistemas ainda possuem limitações, cuja análise permite estimar sua aplicabilidade e confiança em vários casos. Isto é interessante considerando que a abstenção da provisão de uma resposta é preferível a cometer um equívoco ao realizar tal ação. No contexto de classificação e tarefas similares, a indicação desse resultado inconclusivo é chamada de rejeição. A pesquisa que culminou nesta tese proporcionou a concepção, implementação e avaliação de sistemas de aprendizado orientados `a rejeição para duas tarefas distintas: reconhecimento em cenário abertos e agrupamento de dados em fluxo contínuo. Estes sistemas foram derivados da rede neural artificial WiSARD, que teve a modelagem de rejeição incorporada a seu funcionamento. Este texto detalha e discute tais realizações. Ele também apresenta resultados experimentais que permitem avaliar a importância científica e prática da metodologia de ponta proposta

    Real-Time Machine Learning for Quickest Detection

    Get PDF
    Safety-critical Cyber-Physical Systems (CPS) require real-time machine learning for control and decision making. One promising solution is to use deep learning to discover useful patterns for event detection from heterogeneous data. However, deep learning algorithms encounter challenges in CPS with assurability requirements: 1) Decision explainability, 2) Real-time and quickest event detection, and 3) Time-eficient incremental learning. To address these obstacles, I developed a real-time Machine Learning Framework for Quickest Detection (MLQD). To be specific, I first propose the zero-bias neural network, which removes decision bias and preferabilities from regular neural networks and provides an interpretable decision process. Second, I discover the latent space characteristic of the zero-bias neural network and the method to mathematically convert a Deep Neural Network (DNN) classifier into a performance-assured binary abnormality detector. In this way, I can seamlessly integrate the deep neural networks\u27 data processing capability with Quickest Detection (QD) and provide real-time sequential event detection paradigm. Thirdly, after discovering that a critical factor that impedes the incremental learning of neural networks is the concept interference (confusion) in latent space, and I prove that to minimize interference, the concept representation vectors (class fingerprints) within the latent space need to be organized orthogonally and I invent a new incremental learning strategy using the findings, I facilitate deep neural networks in the CPS to evolve eficiently without retraining. All my algorithms are evaluated on real-world applications, ADS-B (Automatic Dependent Surveillance Broadcasting) signal identification, and spoofing detection in the aviation communication system. Finally, I discuss the current trends in MLQD and conclude this dissertation by presenting the future research directions and applications. As a summary, the innovations of this dissertation are as follows: i) I propose the zerobias neural network, which provides transparent latent space characteristics, I apply it to solve the wireless device identification problem. ii) I discover and prove the orthogonal memory organization mechanism in artificial neural networks and apply this mechanism in time-efficient incremental learning. iii) I discover and mathematically prove the converging point theorem, with which we can predict the latent space topological characteristics and estimate the topological maturity of neural networks. iv) I bridge the gap between machine learning and quickest detection with assurable performance
    • …
    corecore