3,591 research outputs found

    A taxonomy framework for unsupervised outlier detection techniques for multi-type data sets

    Get PDF
    The term "outlier" can generally be defined as an observation that is significantly different from the other values in a data set. The outliers may be instances of error or indicate events. The task of outlier detection aims at identifying such outliers in order to improve the analysis of data and further discover interesting and useful knowledge about unusual events within numerous applications domains. In this paper, we report on contemporary unsupervised outlier detection techniques for multiple types of data sets and provide a comprehensive taxonomy framework and two decision trees to select the most suitable technique based on data set. Furthermore, we highlight the advantages, disadvantages and performance issues of each class of outlier detection techniques under this taxonomy framework

    Rate-Distortion Classification for Self-Tuning IoT Networks

    Full text link
    Many future wireless sensor networks and the Internet of Things are expected to follow a software defined paradigm, where protocol parameters and behaviors will be dynamically tuned as a function of the signal statistics. New protocols will be then injected as a software as certain events occur. For instance, new data compressors could be (re)programmed on-the-fly as the monitored signal type or its statistical properties change. We consider a lossy compression scenario, where the application tolerates some distortion of the gathered signal in return for improved energy efficiency. To reap the full benefits of this paradigm, we discuss an automatic sensor profiling approach where the signal class, and in particular the corresponding rate-distortion curve, is automatically assessed using machine learning tools (namely, support vector machines and neural networks). We show that this curve can be reliably estimated on-the-fly through the computation of a small number (from ten to twenty) of statistical features on time windows of a few hundreds samples

    Mining sensor datasets with spatiotemporal neighborhoods

    Get PDF
    Many spatiotemporal data mining methods are dependent on how relationships between a spatiotemporal unit and its neighbors are defined. These relationships are often termed the neighborhood of a spatiotemporal object. The focus of this paper is the discovery of spatiotemporal neighborhoods to find automatically spatiotemporal sub-regions in a sensor dataset. This research is motivated by the need to characterize large sensor datasets like those found in oceanographic and meteorological research. The approach presented in this paper finds spatiotemporal neighborhoods in sensor datasets by combining an agglomerative method to create temporal intervals and a graph-based method to find spatial neighborhoods within each temporal interval. These methods were tested on real-world datasets including (a) sea surface temperature data from the Tropical Atmospheric Ocean Project (TAO) array in the Equatorial Pacific Ocean and (b) NEXRAD precipitation data from the Hydro-NEXRAD system. The results were evaluated based on known patterns of the phenomenon being measured. Furthermore the results were quantified by performing hypothesis testing to establish the statistical significance using Monte Carlo simulations. The approach was also compared with existing approaches using validation metrics namely spatial autocorrelation and temporal interval dissimilarity. The results of these experiments show that our approach indeed identifies highly refined spatiotemporal neighborhoods
    corecore