6,925 research outputs found

    Implementation and assessment of two density-based outlier detection methods over large spatial point clouds

    Get PDF
    Several technologies provide datasets consisting of a large number of spatial points, commonly referred to as point-clouds. These point datasets provide spatial information regarding the phenomenon that is to be investigated, adding value through knowledge of forms and spatial relationships. Accurate methods for automatic outlier detection is a key step. In this note we use a completely open-source workflow to assess two outlier detection methods, statistical outlier removal (SOR) filter and local outlier factor (LOF) filter. The latter was implemented ex-novo for this work using the Point Cloud Library (PCL) environment. Source code is available in a GitHub repository for inclusion in PCL builds. Two very different spatial point datasets are used for accuracy assessment. One is obtained from dense image matching of a photogrammetric survey (SfM) and the other from floating car data (FCD) coming from a smart-city mobility framework providing a position every second of two public transportation bus tracks. Outliers were simulated in the SfM dataset, and manually detected and selected in the FCD dataset. Simulation in SfM was carried out in order to create a controlled set with two classes of outliers: clustered points (up to 30 points per cluster) and isolated points, in both cases at random distances from the other points. Optimal number of nearest neighbours (KNN) and optimal thresholds of SOR and LOF values were defined using area under the curve (AUC) of the receiver operating characteristic (ROC) curve. Absolute differences from median values of LOF and SOR (defined as LOF2 and SOR2) were also tested as metrics for detecting outliers, and optimal thresholds defined through AUC of ROC curves. Results show a strong dependency on the point distribution in the dataset and in the local density fluctuations. In SfM dataset the LOF2 and SOR2 methods performed best, with an optimal KNN value of 60; LOF2 approach gave a slightly better result if considering clustered outliers (true positive rate: LOF2\u2009=\u200959.7% SOR2\u2009=\u200953%). For FCD, SOR with low KNN values performed better for one of the two bus tracks, and LOF with high KNN values for the other; these differences are due to very different local point density. We conclude that choice of outlier detection algorithm very much depends on characteristic of the dataset\u2019s point distribution, no one-solution-fits-all. Conclusions provide some information of what characteristics of the datasets can help to choose the optimal method and KNN values

    Automatic Hyperparameter Tuning Method for Local Outlier Factor, with Applications to Anomaly Detection

    Full text link
    In recent years, there have been many practical applications of anomaly detection such as in predictive maintenance, detection of credit fraud, network intrusion, and system failure. The goal of anomaly detection is to identify in the test data anomalous behaviors that are either rare or unseen in the training data. This is a common goal in predictive maintenance, which aims to forecast the imminent faults of an appliance given abundant samples of normal behaviors. Local outlier factor (LOF) is one of the state-of-the-art models used for anomaly detection, but the predictive performance of LOF depends greatly on the selection of hyperparameters. In this paper, we propose a novel, heuristic methodology to tune the hyperparameters in LOF. A tuned LOF model that uses the proposed method shows good predictive performance in both simulations and real data sets.Comment: 15 pages, 5 figure

    A Real-Time Remote IDS Testbed for Connected Vehicles

    Full text link
    Connected vehicles are becoming commonplace. A constant connection between vehicles and a central server enables new features and services. This added connectivity raises the likelihood of exposure to attackers and risks unauthorized access. A possible countermeasure to this issue are intrusion detection systems (IDS), which aim at detecting these intrusions during or after their occurrence. The problem with IDS is the large variety of possible approaches with no sensible option for comparing them. Our contribution to this problem comprises the conceptualization and implementation of a testbed for an automotive real-world scenario. That amounts to a server-side IDS detecting intrusions into vehicles remotely. To verify the validity of our approach, we evaluate the testbed from multiple perspectives, including its fitness for purpose and the quality of the data it generates. Our evaluation shows that the testbed makes the effective assessment of various IDS possible. It solves multiple problems of existing approaches, including class imbalance. Additionally, it enables reproducibility and generating data of varying detection difficulties. This allows for comprehensive evaluation of real-time, remote IDS.Comment: Peer-reviewed version accepted for publication in the proceedings of the 34th ACM/SIGAPP Symposium On Applied Computing (SAC'19
    • …
    corecore