24,598 research outputs found

    Outlier Detection and Missing Value Estimation in Time Series Traffic Count Data: Final Report of SERC Project GR/G23180.

    Get PDF
    A serious problem in analysing traffic count data is what to do when missing or extreme values occur, perhaps as a result of a breakdown in automatic counting equipment. The objectives of this current work were to attempt to look at ways of solving this problem by: 1)establishing the applicability of time series and influence function techniques for estimating missing values and detecting outliers in time series traffic data; 2)making a comparative assessment of new techniques with those used by traffic engineers in practice for local, regional or national traffic count systems Two alternative approaches were identified as being potentially useful and these were evaluated and compared with methods currently employed for `cleaning' traffic count series. These were based on evaluating the effect of individual or groups of observations on the estimate of the auto-correlation structure and events influencing a parametric model (ARIMA). These were compared with the existing methods which included visual inspection and smoothing techniques such as the exponentially weighted moving average in which means and variances are updated using observations from the same time and day of week. The results showed advantages and disadvantages for each of the methods. The exponentially weighted moving average method tended to detect unreasonable outliers and also suggested replacements which were consistently larger than could reasonably be expected. Methods based on the autocorrelation structure were reasonably successful in detecting events but the replacement values were suspect particularly when there were groups of values needing replacement. The methods also had problems in the presence of non-stationarity, often detecting outliers which were really a result of the changing level of the data rather than extreme values. In the presence of other events, such as a change in level or seasonality, both the influence function and change in autocorrelation present problems of interpretation since there is no way of distinguishing these events from outliers. It is clear that the outlier problem cannot be separated from that of identifying structural changes as many of the statistics used to identify outliers also respond to structural changes. The ARIMA (1,0,0)(0,1,1)7 was found to describe the vast majority of traffic count series which means that the problem of identifying a starting model can largely be avoided with a high degree of assurance. Unfortunately it is clear that a black-box approach to data validation is prone to error but methods such as those described above lend themselves to an interactive graphics data-validation technique in which outliers and other events are highlighted requiring acceptance or otherwise manually. An adaptive approach to fitting the model may result in something which can be more automatic and this would allow for changes in the underlying model to be accommodated. In conclusion it was found that methods based on the autocorrelation structure are the most computationally efficient but lead to problems of interpretation both between different types of event and in the presence of non-stationarity. Using the residuals from a fitted ARIMA model is the most successful method at finding outliers and distinguishing them from other events, being less expensive than case deletion. The replacement values derived from the ARIMA model were found to be the most accurate

    Outlier detection techniques for wireless sensor networks: A survey

    Get PDF
    In the field of wireless sensor networks, those measurements that significantly deviate from the normal pattern of sensed data are considered as outliers. The potential sources of outliers include noise and errors, events, and malicious attacks on the network. Traditional outlier detection techniques are not directly applicable to wireless sensor networks due to the nature of sensor data and specific requirements and limitations of the wireless sensor networks. This survey provides a comprehensive overview of existing outlier detection techniques specifically developed for the wireless sensor networks. Additionally, it presents a technique-based taxonomy and a comparative table to be used as a guideline to select a technique suitable for the application at hand based on characteristics such as data type, outlier type, outlier identity, and outlier degree

    Outlier Detection Techniques For Wireless Sensor Networks: A Survey

    Get PDF
    In the field of wireless sensor networks, measurements that significantly deviate from the normal pattern of sensed data are considered as outliers. The potential sources of outliers include noise and errors, events, and malicious attacks on the network. Traditional outlier detection techniques are not directly applicable to wireless sensor networks due to the multivariate nature of sensor data and specific requirements and limitations of the wireless sensor networks. This survey provides a comprehensive overview of existing outlier detection techniques specifically developed for the wireless sensor networks. Additionally, it presents a technique-based taxonomy and a decision tree to be used as a guideline to select a technique suitable for the application at hand based on characteristics such as data type, outlier type, outlier degree

    Online Nonparametric Anomaly Detection based on Geometric Entropy Minimization

    Full text link
    We consider the online and nonparametric detection of abrupt and persistent anomalies, such as a change in the regular system dynamics at a time instance due to an anomalous event (e.g., a failure, a malicious activity). Combining the simplicity of the nonparametric Geometric Entropy Minimization (GEM) method with the timely detection capability of the Cumulative Sum (CUSUM) algorithm we propose a computationally efficient online anomaly detection method that is applicable to high-dimensional datasets, and at the same time achieve a near-optimum average detection delay performance for a given false alarm constraint. We provide new insights to both GEM and CUSUM, including new asymptotic analysis for GEM, which enables soft decisions for outlier detection, and a novel interpretation of CUSUM in terms of the discrepancy theory, which helps us generalize it to the nonparametric GEM statistic. We numerically show, using both simulated and real datasets, that the proposed nonparametric algorithm attains a close performance to the clairvoyant parametric CUSUM test.Comment: to appear in IEEE International Symposium on Information Theory (ISIT) 201

    A survey of outlier detection methodologies

    Get PDF
    Outlier detection has been used for centuries to detect and, where appropriate, remove anomalous observations from data. Outliers arise due to mechanical faults, changes in system behaviour, fraudulent behaviour, human error, instrument error or simply through natural deviations in populations. Their detection can identify system faults and fraud before they escalate with potentially catastrophic consequences. It can identify errors and remove their contaminating effect on the data set and as such to purify the data for processing. The original outlier detection methods were arbitrary but now, principled and systematic techniques are used, drawn from the full gamut of Computer Science and Statistics. In this paper, we introduce a survey of contemporary techniques for outlier detection. We identify their respective motivations and distinguish their advantages and disadvantages in a comparative review

    A taxonomy framework for unsupervised outlier detection techniques for multi-type data sets

    Get PDF
    The term "outlier" can generally be defined as an observation that is significantly different from the other values in a data set. The outliers may be instances of error or indicate events. The task of outlier detection aims at identifying such outliers in order to improve the analysis of data and further discover interesting and useful knowledge about unusual events within numerous applications domains. In this paper, we report on contemporary unsupervised outlier detection techniques for multiple types of data sets and provide a comprehensive taxonomy framework and two decision trees to select the most suitable technique based on data set. Furthermore, we highlight the advantages, disadvantages and performance issues of each class of outlier detection techniques under this taxonomy framework

    Learning how to be robust: Deep polynomial regression

    Get PDF
    Polynomial regression is a recurrent problem with a large number of applications. In computer vision it often appears in motion analysis. Whatever the application, standard methods for regression of polynomial models tend to deliver biased results when the input data is heavily contaminated by outliers. Moreover, the problem is even harder when outliers have strong structure. Departing from problem-tailored heuristics for robust estimation of parametric models, we explore deep convolutional neural networks. Our work aims to find a generic approach for training deep regression models without the explicit need of supervised annotation. We bypass the need for a tailored loss function on the regression parameters by attaching to our model a differentiable hard-wired decoder corresponding to the polynomial operation at hand. We demonstrate the value of our findings by comparing with standard robust regression methods. Furthermore, we demonstrate how to use such models for a real computer vision problem, i.e., video stabilization. The qualitative and quantitative experiments show that neural networks are able to learn robustness for general polynomial regression, with results that well overpass scores of traditional robust estimation methods.Comment: 18 pages, conferenc

    Automatic Bayesian Density Analysis

    Full text link
    Making sense of a dataset in an automatic and unsupervised fashion is a challenging problem in statistics and AI. Classical approaches for {exploratory data analysis} are usually not flexible enough to deal with the uncertainty inherent to real-world data: they are often restricted to fixed latent interaction models and homogeneous likelihoods; they are sensitive to missing, corrupt and anomalous data; moreover, their expressiveness generally comes at the price of intractable inference. As a result, supervision from statisticians is usually needed to find the right model for the data. However, since domain experts are not necessarily also experts in statistics, we propose Automatic Bayesian Density Analysis (ABDA) to make exploratory data analysis accessible at large. Specifically, ABDA allows for automatic and efficient missing value estimation, statistical data type and likelihood discovery, anomaly detection and dependency structure mining, on top of providing accurate density estimation. Extensive empirical evidence shows that ABDA is a suitable tool for automatic exploratory analysis of mixed continuous and discrete tabular data.Comment: In proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence (AAAI-19
    corecore