2,560 research outputs found

    A survey of machine learning methods applied to anomaly detection on drinking-water quality data

    Get PDF
    Abstract: Traditional machine learning (ML) techniques such as support vector machine, logistic regression, and artificial neural network have been applied most frequently in water quality anomaly detection tasks. This paper presents a review of progress and advances made in detecting anomalies in water quality data using ML techniques. The review encompasses both traditional ML and deep learning (DL) approaches. Our findings indicate that: 1) Generally, DL approaches outperform traditional ML techniques in terms of feature learning accuracy and fewer false positive rates. However, is difficult to make a fair comparison between studies because of different datasets, models and parameters employed. 2) We notice that despite advances made and the advantages of the extreme learning machine (ELM), application of ELM is sparsely exploited in this domain. This study also proposes a hybrid DL-ELM framework as a possible solution that could be investigated further and used to detect anomalies in water quality data

    End-to-end anomaly detection in stream data

    Get PDF
    Nowadays, huge volumes of data are generated with increasing velocity through various systems, applications, and activities. This increases the demand for stream and time series analysis to react to changing conditions in real-time for enhanced efficiency and quality of service delivery as well as upgraded safety and security in private and public sectors. Despite its very rich history, time series anomaly detection is still one of the vital topics in machine learning research and is receiving increasing attention. Identifying hidden patterns and selecting an appropriate model that fits the observed data well and also carries over to unobserved data is not a trivial task. Due to the increasing diversity of data sources and associated stochastic processes, this pivotal data analysis topic is loaded with various challenges like complex latent patterns, concept drift, and overfitting that may mislead the model and cause a high false alarm rate. Handling these challenges leads the advanced anomaly detection methods to develop sophisticated decision logic, which turns them into mysterious and inexplicable black-boxes. Contrary to this trend, end-users expect transparency and verifiability to trust a model and the outcomes it produces. Also, pointing the users to the most anomalous/malicious areas of time series and causal features could save them time, energy, and money. For the mentioned reasons, this thesis is addressing the crucial challenges in an end-to-end pipeline of stream-based anomaly detection through the three essential phases of behavior prediction, inference, and interpretation. The first step is focused on devising a time series model that leads to high average accuracy as well as small error deviation. On this basis, we propose higher-quality anomaly detection and scoring techniques that utilize the related contexts to reclassify the observations and post-pruning the unjustified events. Last but not least, we make the predictive process transparent and verifiable by providing meaningful reasoning behind its generated results based on the understandable concepts by a human. The provided insight can pinpoint the anomalous regions of time series and explain why the current status of a system has been flagged as anomalous. Stream-based anomaly detection research is a principal area of innovation to support our economy, security, and even the safety and health of societies worldwide. We believe our proposed analysis techniques can contribute to building a situational awareness platform and open new perspectives in a variety of domains like cybersecurity, and health

    Climate-informed stochastic hydrological modeling: Incorporating decadal-scale variability using paleo data

    Get PDF
    A hierarchical framework for incorporating modes of climate variability into stochastic simulations of hydrological data is developed, termed the climate-informed multi-time scale stochastic (CIMSS) framework. A case study on two catchments in eastern Australia illustrates this framework. To develop an identifiable model characterizing long-term variability for the first level of the hierarchy, paleoclimate proxies, and instrumental indices describing the Interdecadal Pacific Oscillation (IPO) and the Pacific Decadal Oscillation (PDO) are analyzed. A new paleo IPO-PDO time series dating back 440 yr is produced, combining seven IPO-PDO paleo sources using an objective smoothing procedure to fit low-pass filters to individual records. The paleo data analysis indicates that wet/dry IPO-PDO states have a broad range of run lengths, with 90% between 3 and 33 yr and a mean of 15 yr. The Markov chain model, previously used to simulate oscillating wet/dry climate states, is found to underestimate the probability of wet/dry periods >5 yr, and is rejected in favor of a gamma distribution for simulating the run lengths of the wet/dry IPO-PDO states. For the second level of the hierarchy, a seasonal rainfall model is conditioned on the simulated IPO-PDO state. The model is able to replicate observed statistics such as seasonal and multiyear accumulated rainfall distributions and interannual autocorrelations. Mean seasonal rainfall in the IPO-PDO dry states is found to be 15%-28% lower than the wet state at the case study sites. In comparison, an annual lag-one autoregressive model is unable to adequately capture the observed rainfall distribution within separate IPO-PDO states. Copyright © 2011 by the American Geophysical Union.Benjamin J. Henley, Mark A. Thyer, George Kuczera and Stewart W. Frank

    Climate-informed stochastic hydrological modeling: Incorporating decadal-scale variability using paleo data

    Get PDF
    A hierarchical framework for incorporating modes of climate variability into stochastic simulations of hydrological data is developed, termed the climate-informed multi-time scale stochastic (CIMSS) framework. A case study on two catchments in eastern Australia illustrates this framework. To develop an identifiable model characterizing long-term variability for the first level of the hierarchy, paleoclimate proxies, and instrumental indices describing the Interdecadal Pacific Oscillation (IPO) and the Pacific Decadal Oscillation (PDO) are analyzed. A new paleo IPO-PDO time series dating back 440 yr is produced, combining seven IPO-PDO paleo sources using an objective smoothing procedure to fit low-pass filters to individual records. The paleo data analysis indicates that wet/dry IPO-PDO states have a broad range of run lengths, with 90% between 3 and 33 yr and a mean of 15 yr. The Markov chain model, previously used to simulate oscillating wet/dry climate states, is found to underestimate the probability of wet/dry periods >5 yr, and is rejected in favor of a gamma distribution for simulating the run lengths of the wet/dry IPO-PDO states. For the second level of the hierarchy, a seasonal rainfall model is conditioned on the simulated IPO-PDO state. The model is able to replicate observed statistics such as seasonal and multiyear accumulated rainfall distributions and interannual autocorrelations. Mean seasonal rainfall in the IPO-PDO dry states is found to be 15%-28% lower than the wet state at the case study sites. In comparison, an annual lag-one autoregressive model is unable to adequately capture the observed rainfall distribution within separate IPO-PDO states. Copyright © 2011 by the American Geophysical Union.Benjamin J. Henley, Mark A. Thyer, George Kuczera and Stewart W. Frank

    Anomaly Detection in BACnet/IP managed Building Automation Systems

    Get PDF
    Building Automation Systems (BAS) are a collection of devices and software which manage the operation of building services. The BAS market is expected to be a $19.25 billion USD industry by 2023, as a core feature of both the Internet of Things and Smart City technologies. However, securing these systems from cyber security threats is an emerging research area. Since initial deployment, BAS have evolved from isolated standalone networks to heterogeneous, interconnected networks allowing external connectivity through the Internet. The most prominent BAS protocol is BACnet/IP, which is estimated to hold 54.6% of world market share. BACnet/IP security features are often not implemented in BAS deployments, leaving systems unprotected against known network threats. This research investigated methods of detecting anomalous network traffic in BACnet/IP managed BAS in an effort to combat threats posed to these systems. This research explored the threats facing BACnet/IP devices, through analysis of Internet accessible BACnet devices, vendor-defined device specifications, investigation of the BACnet specification, and known network attacks identified in the surrounding literature. The collected data were used to construct a threat matrix, which was applied to models of BACnet devices to evaluate potential exposure. Further, two potential unknown vulnerabilities were identified and explored using state modelling and device simulation. A simulation environment and attack framework were constructed to generate both normal and malicious network traffic to explore the application of machine learning algorithms to identify both known and unknown network anomalies. To identify network patterns between the generated normal and malicious network traffic, unsupervised clustering, graph analysis with an unsupervised community detection algorithm, and time series analysis were used. The explored methods identified distinguishable network patterns for frequency-based known network attacks when compared to normal network traffic. However, as stand-alone methods for anomaly detection, these methods were found insufficient. Subsequently, Artificial Neural Networks and Hidden Markov Models were explored and found capable of detecting known network attacks. Further, Hidden Markov Models were also capable of detecting unknown network attacks in the generated datasets. The classification accuracy of the Hidden Markov Models was evaluated using the Matthews Correlation Coefficient which accounts for imbalanced class sizes and assess both positive and negative classification ability for deriving its metric. The Hidden Markov Models were found capable of repeatedly detecting both known and unknown BACnet/IP attacks with True Positive Rates greater than 0.99 and Matthews Correlation Coefficients greater than 0.8 for five of six evaluated hosts. This research identified and evaluated a range of methods capable of identifying anomalies in simulated BACnet/IP network traffic. Further, this research found that Hidden Markov Models were accurate at classifying both known and unknown attacks in the evaluated BACnet/IP managed BAS network
    corecore