62,539 research outputs found

    Exploring Ways of Identifying Outliers in Spatial Point Patterns

    Get PDF
    This work discusses alternative methods to detect outliers in spatial point patterns. Outliers are defined based on location only and also with respect to associated variables. Throughout the thesis we discuss five case studies, three of them come from experiments with spiders and bees, and the other two are data from earthquakes in a certain region. One of the main conclusions is that when detecting outliers from the point of view of location we need to take into consideration both the degree of clustering of the events and the context of the study. When detecting outliers from the point of view of an associated variable, outliers can be identified from a global or local perspective. For global outliers, one of the main questions addressed is whether the outliers tend to be clustered or randomly distributed in the region. All the work was done using the R programming language

    Does economic geography matter for Pakistan? a spatial exploratory analysis of income and education inequalities

    Get PDF
    Generally, econometric studies on socio-economic inequalities consider regions as independent entities, ignoring the likely possibility of spatial interaction between them. This interaction may cause spatial dependency or clustering, which is referred to as spatial autocorrelation. This paper analyzes for the first time, the spatial clustering of income, income inequality, education, human development, and growth by employing spatial exploratory data analysis (ESDA) techniques to data on 98 Pakistani districts. By detecting outliers and clusters, ESDA allows policy makers to focus on the geography of socio-economic regional characteristics. Global and local measures of spatial autocorrelation have been computed using the Moran’s I and the Geary’s C index to obtain estimates of the spatial autocorrelation of spatial disparities across districts. The overall finding is that the distribution of district wise income inequality, income, education attainment, growth, and development levels, exhibits a significant tendency for socio-economic inequalities and human development levels to cluster in Pakistan (i.e. the presence of spatial autocorrelation is confirmed).Spatial effects; spatial exploratory analysis; spatial disparities; income inequality; education inequality; spatial autocorrelation

    Implementation and assessment of two density-based outlier detection methods over large spatial point clouds

    Get PDF
    Several technologies provide datasets consisting of a large number of spatial points, commonly referred to as point-clouds. These point datasets provide spatial information regarding the phenomenon that is to be investigated, adding value through knowledge of forms and spatial relationships. Accurate methods for automatic outlier detection is a key step. In this note we use a completely open-source workflow to assess two outlier detection methods, statistical outlier removal (SOR) filter and local outlier factor (LOF) filter. The latter was implemented ex-novo for this work using the Point Cloud Library (PCL) environment. Source code is available in a GitHub repository for inclusion in PCL builds. Two very different spatial point datasets are used for accuracy assessment. One is obtained from dense image matching of a photogrammetric survey (SfM) and the other from floating car data (FCD) coming from a smart-city mobility framework providing a position every second of two public transportation bus tracks. Outliers were simulated in the SfM dataset, and manually detected and selected in the FCD dataset. Simulation in SfM was carried out in order to create a controlled set with two classes of outliers: clustered points (up to 30 points per cluster) and isolated points, in both cases at random distances from the other points. Optimal number of nearest neighbours (KNN) and optimal thresholds of SOR and LOF values were defined using area under the curve (AUC) of the receiver operating characteristic (ROC) curve. Absolute differences from median values of LOF and SOR (defined as LOF2 and SOR2) were also tested as metrics for detecting outliers, and optimal thresholds defined through AUC of ROC curves. Results show a strong dependency on the point distribution in the dataset and in the local density fluctuations. In SfM dataset the LOF2 and SOR2 methods performed best, with an optimal KNN value of 60; LOF2 approach gave a slightly better result if considering clustered outliers (true positive rate: LOF2\u2009=\u200959.7% SOR2\u2009=\u200953%). For FCD, SOR with low KNN values performed better for one of the two bus tracks, and LOF with high KNN values for the other; these differences are due to very different local point density. We conclude that choice of outlier detection algorithm very much depends on characteristic of the dataset\u2019s point distribution, no one-solution-fits-all. Conclusions provide some information of what characteristics of the datasets can help to choose the optimal method and KNN values

    AN ITERATIVE PROCEDURE FOR OUTLIER DETECTION IN GSTAR(1;1) MODEL

    Get PDF
    Outliers are observations that differ significantly from others that can affect the estimation results in the model and reduce the estimator's accuracy. To deal with outliers is to remove outliers from the data. However, sometimes important information is contained in the outlier, so eliminating outliers is a misinterpretation. There are two types of outliers in the time series model, Innovative Outlier (IO) and Additive Outlier (AO). In the GSTAR model, outliers and spatial and time correlations can also be detected. We introduce an iterative procedure for detecting outliers in the GSTAR model. The first step is to form a GSTAR model without outlier factors. Furthermore, the detection of outliers from the model's residuals. If an outlier is detected, add an outlier factor into the initial model and estimate the parameters so that a new GSTAR model and residuals are obtained from the model. The process is repeated by detecting outliers and adding them to the model until a GSTAR model is obtained with no outliers detected. As a result, outliers are not removed or ignored but add an outlier factor to the GSTAR model. This paper presents case studies about Dengue Hemorrhagic Fever cases in five locations in West Kalimantan Province. These are the subject of the GSTAR model with adding outlier factors. The result of this paper is that using an iterative procedure to detect outliers based on the GSTAR residual model provides better accuracy than the regular GSTAR model (without adding outliers to the model). It can be solved without removing outliers from the data by adding outlier factors to the model. This way, the critical information in the outlier id is not lost, and an accurate ore model is obtained

    A taxonomy framework for unsupervised outlier detection techniques for multi-type data sets

    Get PDF
    The term "outlier" can generally be defined as an observation that is significantly different from the other values in a data set. The outliers may be instances of error or indicate events. The task of outlier detection aims at identifying such outliers in order to improve the analysis of data and further discover interesting and useful knowledge about unusual events within numerous applications domains. In this paper, we report on contemporary unsupervised outlier detection techniques for multiple types of data sets and provide a comprehensive taxonomy framework and two decision trees to select the most suitable technique based on data set. Furthermore, we highlight the advantages, disadvantages and performance issues of each class of outlier detection techniques under this taxonomy framework

    Adapted K-Nearest Neighbors for Detecting Anomalies on Spatio–Temporal Traffic Flow

    Get PDF
    Outlier detection is an extensive research area, which has been intensively studied in several domains such as biological sciences, medical diagnosis, surveillance, and traffic anomaly detection. This paper explores advances in the outlier detection area by finding anomalies in spatio-temporal urban traffic flow. It proposes a new approach by considering the distribution of the flows in a given time interval. The flow distribution probability (FDP) databases are first constructed from the traffic flows by considering both spatial and temporal information. The outlier detection mechanism is then applied to the coming flow distribution probabilities, the inliers are stored to enrich the FDP databases, while the outliers are excluded from the FDP databases. Moreover, a k-nearest neighbor for distance-based outlier detection is investigated and adopted for FDP outlier detection. To validate the proposed framework, real data from Odense traffic flow case are evaluated at ten locations. The results reveal that the proposed framework is able to detect the real distribution of flow outliers. Another experiment has been carried out on Beijing data, the results show that our approach outperforms the baseline algorithms for high-urban traffic flow

    Automatic Detection of Outliers in Multibeam Echo Sounding Data

    Get PDF
    The data volumes produced by new generation multibeam systems are very large, especially for shallow water systems. Results from recent multibeam surveys indicate that the ratio of the field survey time, to the time used in interactive editing through graphical editing tools, is about 1:1. An important reason for the large amount of processing time is that users subjectively decide which soundings are outliers. There is an apparent need for an automated approach for detecting outliers that would reduce the extensive labor and obtain consistent results from the multibeam data cleaning process, independent of the individual that has processed the data. The proposed automated algorithm for cleaning multibeam soundings was tested using the SAX-99 (Destin FL) multibeam survey data [2]. Eight days of survey data (6.9 Gigabyte) were cleaned in 2.5 hours on an SGI platform. A comparison of the automatically cleaned data with the subjective, interactively cleaned data indicates that the proposed method is, if not better, at least equivalent to interactive editing as used on the SAX-99 multibeam data. Furthermore, the ratio of acquisition to processing time is considerably improved since the time required for cleaning the data was decreased from 192 hours to 2.5 hours (an improvement by a factor of 77)

    Outlier Detection and Comparison of Origin-Destination Flows Using Data Depth

    Get PDF
    Advances in location-aware technology have resulted in massive trajectory data. Origin-destination (OD) trajectories provide rich information on urban flow and transport demand. This study describes a new method for detecting OD flows outliers and conducting hypothesis testing between two OD flow datasets in terms of the variations of spatial extent, that is, spread. The proposed method is based on data depth, which measures the centrality and outlyingness of a point with respect to a given dataset in R^d. Based on the center-outward ordering property, the proposed method analyzes the underlying characteristics of OD flows, such as location, outlyingness, and spread. The ability of the method to detect OD anomalies is compared with that of the Mahalanobis distance approach, and an F-test is used to verify the difference in scale. Empirical evaluation has demonstrated that our method effectively identifies OD flows outliers in an interactive way. Furthermore, the method can provide new perspectives such as spatial extent by considering the overall structure of data when comparing two different OD flows in terms of scale
    corecore