Search CORE

2,560 research outputs found

A survey of machine learning methods applied to anomaly detection on drinking-water quality data

Author: Aigbavboa Clinton
Dogo Eustace M.
Nwulu Nnamdi I.
Twala Bhekisipho
Publication venue
Publication date: 01/01/2019
Field of study

Abstract: Traditional machine learning (ML) techniques such as support vector machine, logistic regression, and artificial neural network have been applied most frequently in water quality anomaly detection tasks. This paper presents a review of progress and advances made in detecting anomalies in water quality data using ML techniques. The review encompasses both traditional ML and deep learning (DL) approaches. Our findings indicate that: 1) Generally, DL approaches outperform traditional ML techniques in terms of feature learning accuracy and fewer false positive rates. However, is difficult to make a fair comparison between studies because of different datasets, models and parameters employed. 2) We notice that despite advances made and the advantages of the extreme learning machine (ELM), application of ELM is sparsely exploited in this domain. This study also proposes a hybrid DL-ELM framework as a possible solution that could be investigated further and used to detect anomalies in water quality data

University of Johannesburg Institutional Repository

End-to-end anomaly detection in stream data

Author: Zohrevand Zahra
Publication venue
Publication date: 16/12/2020
Field of study

Nowadays, huge volumes of data are generated with increasing velocity through various systems, applications, and activities. This increases the demand for stream and time series analysis to react to changing conditions in real-time for enhanced efficiency and quality of service delivery as well as upgraded safety and security in private and public sectors. Despite its very rich history, time series anomaly detection is still one of the vital topics in machine learning research and is receiving increasing attention. Identifying hidden patterns and selecting an appropriate model that fits the observed data well and also carries over to unobserved data is not a trivial task. Due to the increasing diversity of data sources and associated stochastic processes, this pivotal data analysis topic is loaded with various challenges like complex latent patterns, concept drift, and overfitting that may mislead the model and cause a high false alarm rate. Handling these challenges leads the advanced anomaly detection methods to develop sophisticated decision logic, which turns them into mysterious and inexplicable black-boxes. Contrary to this trend, end-users expect transparency and verifiability to trust a model and the outcomes it produces. Also, pointing the users to the most anomalous/malicious areas of time series and causal features could save them time, energy, and money. For the mentioned reasons, this thesis is addressing the crucial challenges in an end-to-end pipeline of stream-based anomaly detection through the three essential phases of behavior prediction, inference, and interpretation. The first step is focused on devising a time series model that leads to high average accuracy as well as small error deviation. On this basis, we propose higher-quality anomaly detection and scoring techniques that utilize the related contexts to reclassify the observations and post-pruning the unjustified events. Last but not least, we make the predictive process transparent and verifiable by providing meaningful reasoning behind its generated results based on the understandable concepts by a human. The provided insight can pinpoint the anomalous regions of time series and explain why the current status of a system has been flagged as anomalous. Stream-based anomaly detection research is a principal area of innovation to support our economy, security, and even the safety and health of societies worldwide. We believe our proposed analysis techniques can contribute to building a situational awareness platform and open new perspectives in a variety of domains like cybersecurity, and health

Simon Fraser University Institutional Repository

Climate-informed stochastic hydrological modeling: Incorporating decadal-scale variability using paleo data

Author: Benjamin J. Henley
Mark A. Thyer
George Kuczera
Stewart W. Franks
Akintug
Allan
Arblaster
Arguez
Biondi
Box
Cai
Cobb
D'Arrigo
D'Arrigo
Folland
Frost
Gedalof
Gelman
Haario
Haslett
Heinrich
Hendon
Kiem
Kiem
Koutsoyiannis
Kwon
Lambert
Lavery
Lima
Linsley
Linsley
MacDonald
Mann
Mann
Mantua
Mauget
McBride
McGowan
McGregor
Mehrotra
Meinke
Meneghini
Micevski
Newman
Parker
Potter
Power
Power
Prairie
Rayner
Saji
Salas
Samuel
Schneider
Schwarz
Sharma
Shen
Solomon
Soon
Speer
Stedinger
Taylor
Thyer
Thyer
Thyer
Thyer
Tome
Torrence
Ummenhofer
Verdon
Verdon
Verdon
Verdon-Kidd
Vörösmarty
Westra
Whiting
Zhang
Publication venue: 'American Geophysical Union (AGU)'
Publication date: 01/01/2011
Field of study

A hierarchical framework for incorporating modes of climate variability into stochastic simulations of hydrological data is developed, termed the climate-informed multi-time scale stochastic (CIMSS) framework. A case study on two catchments in eastern Australia illustrates this framework. To develop an identifiable model characterizing long-term variability for the first level of the hierarchy, paleoclimate proxies, and instrumental indices describing the Interdecadal Pacific Oscillation (IPO) and the Pacific Decadal Oscillation (PDO) are analyzed. A new paleo IPO-PDO time series dating back 440 yr is produced, combining seven IPO-PDO paleo sources using an objective smoothing procedure to fit low-pass filters to individual records. The paleo data analysis indicates that wet/dry IPO-PDO states have a broad range of run lengths, with 90% between 3 and 33 yr and a mean of 15 yr. The Markov chain model, previously used to simulate oscillating wet/dry climate states, is found to underestimate the probability of wet/dry periods >5 yr, and is rejected in favor of a gamma distribution for simulating the run lengths of the wet/dry IPO-PDO states. For the second level of the hierarchy, a seasonal rainfall model is conditioned on the simulated IPO-PDO state. The model is able to replicate observed statistics such as seasonal and multiyear accumulated rainfall distributions and interannual autocorrelations. Mean seasonal rainfall in the IPO-PDO dry states is found to be 15%-28% lower than the wet state at the case study sites. In comparison, an annual lag-one autoregressive model is unable to adequately capture the observed rainfall distribution within separate IPO-PDO states. Copyright © 2011 by the American Geophysical Union.Benjamin J. Henley, Mark A. Thyer, George Kuczera and Stewart W. Frank

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne

University of Newcastle's Digital Repository

Crossref

Adelaide Research & Scholarship

Oxford University Research Archive

Climate-informed stochastic hydrological modeling: Incorporating decadal-scale variability using paleo data

Author: Akintug
Allan
Arblaster
Arguez
Benjamin J. Henley
Biondi
Box
Cai
Cobb
D'Arrigo
D'Arrigo
Folland
Frost
Gedalof
Gelman
George Kuczera
Haario
Haslett
Heinrich
Hendon
Kiem
Kiem
Koutsoyiannis
Kwon
Lambert
Lavery
Lima
Linsley
Linsley
MacDonald
Mann
Mann
Mantua
Mark A. Thyer
Mauget
McBride
McGowan
McGregor
Mehrotra
Meinke
Meneghini
Micevski
Newman
Parker
Potter
Power
Power
Prairie
Rayner
Saji
Salas
Samuel
Schneider
Schwarz
Sharma
Shen
Solomon
Soon
Speer
Stedinger
Stewart W. Franks
Taylor
Thyer
Thyer
Thyer
Thyer
Tome
Torrence
Ummenhofer
Verdon
Verdon
Verdon
Verdon-Kidd
Vörösmarty
Westra
Whiting
Zhang
Publication venue: 'American Geophysical Union (AGU)'
Publication date: 01/01/2011
Field of study

University of Newcastle's Digital Repository

Crossref

Adelaide Research & Scholarship

Anomaly Detection in BACnet/IP managed Building Automation Systems

Author: Peacock Matthew
Publication venue: Edith Cowan University, Research Online, Perth, Western Australia
Publication date: 01/01/2019
Field of study

Building Automation Systems (BAS) are a collection of devices and software which manage the operation of building services. The BAS market is expected to be a $19.25 billion USD industry by 2023, as a core feature of both the Internet of Things and Smart City technologies. However, securing these systems from cyber security threats is an emerging research area. Since initial deployment, BAS have evolved from isolated standalone networks to heterogeneous, interconnected networks allowing external connectivity through the Internet. The most prominent BAS protocol is BACnet/IP, which is estimated to hold 54.6% of world market share. BACnet/IP security features are often not implemented in BAS deployments, leaving systems unprotected against known network threats. This research investigated methods of detecting anomalous network traffic in BACnet/IP managed BAS in an effort to combat threats posed to these systems. This research explored the threats facing BACnet/IP devices, through analysis of Internet accessible BACnet devices, vendor-defined device specifications, investigation of the BACnet specification, and known network attacks identified in the surrounding literature. The collected data were used to construct a threat matrix, which was applied to models of BACnet devices to evaluate potential exposure. Further, two potential unknown vulnerabilities were identified and explored using state modelling and device simulation. A simulation environment and attack framework were constructed to generate both normal and malicious network traffic to explore the application of machine learning algorithms to identify both known and unknown network anomalies. To identify network patterns between the generated normal and malicious network traffic, unsupervised clustering, graph analysis with an unsupervised community detection algorithm, and time series analysis were used. The explored methods identified distinguishable network patterns for frequency-based known network attacks when compared to normal network traffic. However, as stand-alone methods for anomaly detection, these methods were found insufficient. Subsequently, Artificial Neural Networks and Hidden Markov Models were explored and found capable of detecting known network attacks. Further, Hidden Markov Models were also capable of detecting unknown network attacks in the generated datasets. The classification accuracy of the Hidden Markov Models was evaluated using the Matthews Correlation Coefficient which accounts for imbalanced class sizes and assess both positive and negative classification ability for deriving its metric. The Hidden Markov Models were found capable of repeatedly detecting both known and unknown BACnet/IP attacks with True Positive Rates greater than 0.99 and Matthews Correlation Coefficients greater than 0.8 for five of six evaluated hosts. This research identified and evaluated a range of methods capable of identifying anomalies in simulated BACnet/IP network traffic. Further, this research found that Hidden Markov Models were accurate at classifying both known and unknown attacks in the evaluated BACnet/IP managed BAS network

Research Online @ ECU

Recommended from our members

State-of-the-art on research and applications of machine learning in the building life cycle

Author: Hong T
Luo X
Wang Z
Zhang W
Publication venue: eScholarship, University of California
Publication date: 01/04/2020
Field of study

Fueled by big data, powerful and affordable computing resources, and advanced algorithms, machine learning has been explored and applied to buildings research for the past decades and has demonstrated its potential to enhance building performance. This study systematically surveyed how machine learning has been applied at different stages of building life cycle. By conducting a literature search on the Web of Knowledge platform, we found 9579 papers in this field and selected 153 papers for an in-depth review. The number of published papers is increasing year by year, with a focus on building design, operation, and control. However, no study was found using machine learning in building commissioning. There are successful pilot studies on fault detection and diagnosis of HVAC equipment and systems, load prediction, energy baseline estimate, load shape clustering, occupancy prediction, and learning occupant behaviors and energy use patterns. None of the existing studies were adopted broadly by the building industry, due to common challenges including (1) lack of large scale labeled data to train and validate the model, (2) lack of model transferability, which limits a model trained with one data-rich building to be used in another building with limited data, (3) lack of strong justification of costs and benefits of deploying machine learning, and (4) the performance might not be reliable and robust for the stated goals, as the method might work for some buildings but could not be generalized to others. Findings from the study can inform future machine learning research to improve occupant comfort, energy efficiency, demand flexibility, and resilience of buildings, as well as to inspire young researchers in the field to explore multidisciplinary approaches that integrate building science, computing science, data science, and social science

eScholarship - University of California