Search CORE

62,539 research outputs found

Exploring Ways of Identifying Outliers in Spatial Point Patterns

Author: Liu Jie
Publication venue: Digital Commons @ East Tennessee State University
Publication date: 01/05/2015
Field of study

This work discusses alternative methods to detect outliers in spatial point patterns. Outliers are defined based on location only and also with respect to associated variables. Throughout the thesis we discuss five case studies, three of them come from experiments with spiders and bees, and the other two are data from earthquakes in a certain region. One of the main conclusions is that when detecting outliers from the point of view of location we need to take into consideration both the degree of clustering of the events and the context of the study. When detecting outliers from the point of view of an associated variable, outliers can be identified from a global or local perspective. For global outliers, one of the main questions addressed is whether the outliers tend to be clustered or randomly distributed in the region. All the work was done using the R programming language

East Tennessee State University

Does economic geography matter for Pakistan? a spatial exploratory analysis of income and education inequalities

Author: Ahmed Sofia
Publication venue
Publication date
Field of study

Generally, econometric studies on socio-economic inequalities consider regions as independent entities, ignoring the likely possibility of spatial interaction between them. This interaction may cause spatial dependency or clustering, which is referred to as spatial autocorrelation. This paper analyzes for the first time, the spatial clustering of income, income inequality, education, human development, and growth by employing spatial exploratory data analysis (ESDA) techniques to data on 98 Pakistani districts. By detecting outliers and clusters, ESDA allows policy makers to focus on the geography of socio-economic regional characteristics. Global and local measures of spatial autocorrelation have been computed using the Moran’s I and the Geary’s C index to obtain estimates of the spatial autocorrelation of spatial disparities across districts. The overall finding is that the distribution of district wise income inequality, income, education attainment, growth, and development levels, exhibits a significant tendency for socio-economic inequalities and human development levels to cluster in Pakistan (i.e. the presence of spatial autocorrelation is confirmed).Spatial effects; spatial exploratory analysis; spatial disparities; income inequality; education inequality; spatial autocorrelation

Research Papers in Economics

Implementation and assessment of two density-based outlier detection methods over large spatial point clouds

Author: Fissore Francesca
Masiero Andrea
Pirotti Francesco
Ravanelli Roberta
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2018
Field of study

Several technologies provide datasets consisting of a large number of spatial points, commonly referred to as point-clouds. These point datasets provide spatial information regarding the phenomenon that is to be investigated, adding value through knowledge of forms and spatial relationships. Accurate methods for automatic outlier detection is a key step. In this note we use a completely open-source workflow to assess two outlier detection methods, statistical outlier removal (SOR) filter and local outlier factor (LOF) filter. The latter was implemented ex-novo for this work using the Point Cloud Library (PCL) environment. Source code is available in a GitHub repository for inclusion in PCL builds. Two very different spatial point datasets are used for accuracy assessment. One is obtained from dense image matching of a photogrammetric survey (SfM) and the other from floating car data (FCD) coming from a smart-city mobility framework providing a position every second of two public transportation bus tracks. Outliers were simulated in the SfM dataset, and manually detected and selected in the FCD dataset. Simulation in SfM was carried out in order to create a controlled set with two classes of outliers: clustered points (up to 30 points per cluster) and isolated points, in both cases at random distances from the other points. Optimal number of nearest neighbours (KNN) and optimal thresholds of SOR and LOF values were defined using area under the curve (AUC) of the receiver operating characteristic (ROC) curve. Absolute differences from median values of LOF and SOR (defined as LOF2 and SOR2) were also tested as metrics for detecting outliers, and optimal thresholds defined through AUC of ROC curves. Results show a strong dependency on the point distribution in the dataset and in the local density fluctuations. In SfM dataset the LOF2 and SOR2 methods performed best, with an optimal KNN value of 60; LOF2 approach gave a slightly better result if considering clustered outliers (true positive rate: LOF2\u2009=\u200959.7% SOR2\u2009=\u200953%). For FCD, SOR with low KNN values performed better for one of the two bus tracks, and LOF with high KNN values for the other; these differences are due to very different local point density. We conclude that choice of outlier detection algorithm very much depends on characteristic of the dataset\u2019s point distribution, no one-solution-fits-all. Conclusions provide some information of what characteristics of the datasets can help to choose the optimal method and KNN values

Florence Research

Directory of Open Access Journals

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna

Archivio della ricerca- Università di Roma La Sapienza

Archivio istituzionale della ricerca - Università di Padova

AN ITERATIVE PROCEDURE FOR OUTLIER DETECTION IN GSTAR(1;1) MODEL

Author: Huda Nur'ainul Miftahul
Imro'ah Nurfitri
Mukhaiyar Utriweni
Publication venue: 'Universitas Pattimura'
Publication date: 01/09/2022
Field of study

Outliers are observations that differ significantly from others that can affect the estimation results in the model and reduce the estimator's accuracy. To deal with outliers is to remove outliers from the data. However, sometimes important information is contained in the outlier, so eliminating outliers is a misinterpretation. There are two types of outliers in the time series model, Innovative Outlier (IO) and Additive Outlier (AO). In the GSTAR model, outliers and spatial and time correlations can also be detected. We introduce an iterative procedure for detecting outliers in the GSTAR model. The first step is to form a GSTAR model without outlier factors. Furthermore, the detection of outliers from the model's residuals. If an outlier is detected, add an outlier factor into the initial model and estimate the parameters so that a new GSTAR model and residuals are obtained from the model. The process is repeated by detecting outliers and adding them to the model until a GSTAR model is obtained with no outliers detected. As a result, outliers are not removed or ignored but add an outlier factor to the GSTAR model. This paper presents case studies about Dengue Hemorrhagic Fever cases in five locations in West Kalimantan Province. These are the subject of the GSTAR model with adding outlier factors. The result of this paper is that using an iterative procedure to detect outliers based on the GSTAR residual model provides better accuracy than the regular GSTAR model (without adding outliers to the model). It can be solved without removing outliers from the data by adding outlier factors to the model. This way, the critical information in the outlier id is not lost, and an accurate ore model is obtained

OJS UNPATTI Publication Center (Universitas Pattimura)

A taxonomy framework for unsupervised outlier detection techniques for multi-type data sets

Author: Havinga P.J.M.
Meratnia N.
Zhang Yang
Publication venue: Centre for Telematics and Information Technology, University of Twente
Publication date: 01/01/2007
Field of study

The term "outlier" can generally be defined as an observation that is significantly different from the other values in a data set. The outliers may be instances of error or indicate events. The task of outlier detection aims at identifying such outliers in order to improve the analysis of data and further discover interesting and useful knowledge about unusual events within numerous applications domains. In this paper, we report on contemporary unsupervised outlier detection techniques for multiple types of data sets and provide a comprehensive taxonomy framework and two decision trees to select the most suitable technique based on data set. Furthermore, we highlight the advantages, disadvantages and performance issues of each class of outlier detection techniques under this taxonomy framework

University of Twente Research Information

Adapted K-Nearest Neighbors for Detecting Anomalies on Spatio–Temporal Traffic Flow

Author: Belhadi Asma
Cano Alberto
Djenouri Djamel
Djenouri Youcef
Lin Chun Wei
Publication venue: VCU Scholars Compass
Publication date: 01/01/2019
Field of study

Outlier detection is an extensive research area, which has been intensively studied in several domains such as biological sciences, medical diagnosis, surveillance, and traffic anomaly detection. This paper explores advances in the outlier detection area by finding anomalies in spatio-temporal urban traffic flow. It proposes a new approach by considering the distribution of the flows in a given time interval. The flow distribution probability (FDP) databases are first constructed from the traffic flows by considering both spatial and temporal information. The outlier detection mechanism is then applied to the coming flow distribution probabilities, the inliers are stored to enrich the FDP databases, while the outliers are excluded from the FDP databases. Moreover, a k-nearest neighbor for distance-based outlier detection is investigated and adopted for FDP outlier detection. To validate the proposed framework, real data from Odense traffic flow case are evaluated at ten locations. The results reveal that the proposed framework is able to detect the real distribution of flow outliers. Another experiment has been carried out on Beijing data, the results show that our approach outperforms the baseline algorithms for high-urban traffic flow

VCU Scholars Compass

NORA - Norwegian Open Research Archives

Automatic Detection of Outliers in Multibeam Echo Sounding Data

Author: Hou Tianhang
Huff Lloyd C
Mayer Larry A.
Publication venue: University of New Hampshire Scholars\u27 Repository
Publication date: 01/05/2001
Field of study

The data volumes produced by new generation multibeam systems are very large, especially for shallow water systems. Results from recent multibeam surveys indicate that the ratio of the field survey time, to the time used in interactive editing through graphical editing tools, is about 1:1. An important reason for the large amount of processing time is that users subjectively decide which soundings are outliers. There is an apparent need for an automated approach for detecting outliers that would reduce the extensive labor and obtain consistent results from the multibeam data cleaning process, independent of the individual that has processed the data. The proposed automated algorithm for cleaning multibeam soundings was tested using the SAX-99 (Destin FL) multibeam survey data [2]. Eight days of survey data (6.9 Gigabyte) were cleaned in 2.5 hours on an SGI platform. A comparison of the automatically cleaned data with the subjective, interactively cleaned data indicates that the proposed method is, if not better, at least equivalent to interactive editing as used on the SAX-99 multibeam data. Furthermore, the ratio of acquisition to processing time is considerably improved since the time required for cleaning the data was decreased from 192 hours to 2.5 hours (an improvement by a factor of 77)

UNH Scholars' Repository

Outlier Detection and Comparison of Origin-Destination Flows Using Data Depth

Author: Jeong Myeong-Hun
Wang Shaowen
Yin Junjun
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 10th International Conference on Geographic Information Science (GIScience 2018)
Publication date: 01/01/2018
Field of study

Advances in location-aware technology have resulted in massive trajectory data. Origin-destination (OD) trajectories provide rich information on urban flow and transport demand. This study describes a new method for detecting OD flows outliers and conducting hypothesis testing between two OD flow datasets in terms of the variations of spatial extent, that is, spread. The proposed method is based on data depth, which measures the centrality and outlyingness of a point with respect to a given dataset in R^d. Based on the center-outward ordering property, the proposed method analyzes the underlying characteristics of OD flows, such as location, outlyingness, and spread. The ability of the method to detect OD anomalies is compared with that of the Mahalanobis distance approach, and an F-test is used to verify the difference in scale. Empirical evaluation has demonstrated that our method effectively identifies OD flows outliers in an interactive way. Furthermore, the method can provide new perspectives such as spatial extent by considering the overall structure of data when comparing two different OD flows in terms of scale

Dagstuhl Research Online Publication Server