1,506 research outputs found

    A framework for automated anomaly detection in high frequency water-quality data from in situ sensors

    Full text link
    River water-quality monitoring is increasingly conducted using automated in situ sensors, enabling timelier identification of unexpected values. However, anomalies caused by technical issues confound these data, while the volume and velocity of data prevent manual detection. We present a framework for automated anomaly detection in high-frequency water-quality data from in situ sensors, using turbidity, conductivity and river level data. After identifying end-user needs and defining anomalies, we ranked their importance and selected suitable detection methods. High priority anomalies included sudden isolated spikes and level shifts, most of which were classified correctly by regression-based methods such as autoregressive integrated moving average models. However, using other water-quality variables as covariates reduced performance due to complex relationships among variables. Classification of drift and periods of anomalously low or high variability improved when we applied replaced anomalous measurements with forecasts, but this inflated false positive rates. Feature-based methods also performed well on high priority anomalies, but were also less proficient at detecting lower priority anomalies, resulting in high false negative rates. Unlike regression-based methods, all feature-based methods produced low false positive rates, but did not and require training or optimization. Rule-based methods successfully detected impossible values and missing observations. Thus, we recommend using a combination of methods to improve anomaly detection performance, whilst minimizing false detection rates. Furthermore, our framework emphasizes the importance of communication between end-users and analysts for optimal outcomes with respect to both detection performance and end-user needs. Our framework is applicable to other types of high frequency time-series data and anomaly detection applications

    Outlier detection techniques for wireless sensor networks: A survey

    Get PDF
    In the field of wireless sensor networks, those measurements that significantly deviate from the normal pattern of sensed data are considered as outliers. The potential sources of outliers include noise and errors, events, and malicious attacks on the network. Traditional outlier detection techniques are not directly applicable to wireless sensor networks due to the nature of sensor data and specific requirements and limitations of the wireless sensor networks. This survey provides a comprehensive overview of existing outlier detection techniques specifically developed for the wireless sensor networks. Additionally, it presents a technique-based taxonomy and a comparative table to be used as a guideline to select a technique suitable for the application at hand based on characteristics such as data type, outlier type, outlier identity, and outlier degree

    Outlier Detection Techniques For Wireless Sensor Networks: A Survey

    Get PDF
    In the field of wireless sensor networks, measurements that significantly deviate from the normal pattern of sensed data are considered as outliers. The potential sources of outliers include noise and errors, events, and malicious attacks on the network. Traditional outlier detection techniques are not directly applicable to wireless sensor networks due to the multivariate nature of sensor data and specific requirements and limitations of the wireless sensor networks. This survey provides a comprehensive overview of existing outlier detection techniques specifically developed for the wireless sensor networks. Additionally, it presents a technique-based taxonomy and a decision tree to be used as a guideline to select a technique suitable for the application at hand based on characteristics such as data type, outlier type, outlier degree

    Subspace Energy Monitoring for Anomaly Detection @Sensor or @Edge

    Get PDF
    The amount of data generated by distributed monitoring systems that can be exploited for anomaly detection, along with real time, bandwidth, and scalability requirements leads to the abandonment of centralized approaches in favor of processing closer to where data are generated. This increases the interest in algorithms coping with the limited computational resources of gateways or sensor nodes. We here propose two dual and lightweight methods for anomaly detection based on generalized spectral analysis. We monitor the signal energy laying along with the principal and anti-principal signal subspaces, and call for an anomaly when such energy changes significantly with respect to normal conditions. A streaming approach for the online estimation of the needed subspaces is also proposed. The methods are tested by applying them to synthetic data and real-world sensor readings. The synthetic setting is used for design space exploration and highlights the tradeoff between accuracy and computational cost. The real-world example deals with structural health monitoring and shows how, despite the extremely low computations costs, our methods are able to detect permanent and transient anomalies that would classically be detected by full spectral analysis

    A taxonomy framework for unsupervised outlier detection techniques for multi-type data sets

    Get PDF
    The term "outlier" can generally be defined as an observation that is significantly different from the other values in a data set. The outliers may be instances of error or indicate events. The task of outlier detection aims at identifying such outliers in order to improve the analysis of data and further discover interesting and useful knowledge about unusual events within numerous applications domains. In this paper, we report on contemporary unsupervised outlier detection techniques for multiple types of data sets and provide a comprehensive taxonomy framework and two decision trees to select the most suitable technique based on data set. Furthermore, we highlight the advantages, disadvantages and performance issues of each class of outlier detection techniques under this taxonomy framework

    Multiple Surface Pipeline Leak Detection Using Real-Time Sensor Data Analysis

    Get PDF
    Pipelines enable the largest volume of both intra and international transportation of oil and gas and play critical roles in the energy sufficiency of countries. The biggest drawback with the use of pipelines for oil and gas transportation is the problem of oil spills whenever the pipelines lose containment. The severity of the oil spill on the environment is a function of the volume of the spill and this is a function of the time taken to detect the leak and contain the spill from the pipeline. A single leak on the Enbridge pipeline spilled 3.3 million liters into the Kalamazoo river while a pipeline rupture in North Dakota which went undetected for 143 days spilled 29 million gallons into the environment.Several leak detection systems (LDS) have been developed with the capacity for rapid detection and localization of pipeline leaks, but the characteristics of these LDS limit their leak detection capability. Machine learning provides an opportunity to develop faster LDS, but it requires access to pipeline leak datasets that are proprietary in nature and not readily available. Current LDS have difficulty in detecting low-volume/low-pressure spills located far away from the inlet and outlet pressure sensors. Some reasons for this include the following, leak induced pressure variation generated by these leaks is dissipated before it gets to the inlet and outlet pressure sensors, another reason is that the LDS are designed for specific minimum detection levels which is a percentage of the flow volume of the pipeline, so when the leak falls below the LDS minimum detection value, the leak will not be detected. Perturbations generated by small volume leaks are often within the threshold values of the pipeline\u27s normal operational envelop as such the LDS disregards these perturbations. These challenges have been responsible for pipeline leaks going on for weeks only to be detected by third-party persons in the vicinity of the leaks. This research has been able to develop a framework for the generation of pipeline datasets using the PIPESIM software and the RAND function in Python. The topological data of the pipeline right of way, the pipeline network design specification, and the fluid flow properties are the required information for this framework. With this information, leaks can be simulated at any point on the pipeline and the datasets generated. This framework will facilitate the generation of the One-class dataset for the pipeline which can be used for the development of LDS using machine learning. The research also developed a leak detection topology for detecting low-volume leaks. This topology comprises of the installation of a pressure sensor with remote data transmission capacity at the midpoint of the line. The sensor utilizes the exception-based transmission scheme where it only transmits when the new data differs from the existing data value. This will extend the battery life of the sensor. The installation of the sensor at the midpoint of the line was found to increase the sensitivity of the LDS to leak-induced pressure variations which were traditionally dissipated before getting to the Inlet/outlet sensors. The research also proposed the development of a Leak Detection as a Service (LDaaS) platform where the pressure data from the inlet and the midpoint sensors are collated and subjected to a specially developed leak detection algorithm for the detection of pipeline leaks. This leak detection topology will enable operators to detect low-volume/low-pressure leaks that would have been missed by the existing leak detection system and deploy the oil spill response plans quicker thus reducing the volume of oil spilled into the environment. It will also provide a platform for regulators to monitor the leak alerts as they are generated and enable them to evaluate the oil spill response plans of the operators

    Cyber–Physical–Social Frameworks for Urban Big Data Systems: A Survey

    Get PDF
    The integration of things’ data on the Web and Web linking for things’ description and discovery is leading the way towards smart Cyber–Physical Systems (CPS). The data generated in CPS represents observations gathered by sensor devices about the ambient environment that can be manipulated by computational processes of the cyber world. Alongside this, the growing use of social networks offers near real-time citizen sensing capabilities as a complementary information source. The resulting Cyber–Physical–Social System (CPSS) can help to understand the real world and provide proactive services to users. The nature of CPSS data brings new requirements and challenges to different stages of data manipulation, including identification of data sources, processing and fusion of different types and scales of data. To gain an understanding of the existing methods and techniques which can be useful for a data-oriented CPSS implementation, this paper presents a survey of the existing research and commercial solutions. We define a conceptual framework for a data-oriented CPSS and detail the various solutions for building human–machine intelligence
    corecore