3,591 research outputs found
A taxonomy framework for unsupervised outlier detection techniques for multi-type data sets
The term "outlier" can generally be defined as an observation that is significantly different from
the other values in a data set. The outliers may be instances of error or indicate events. The
task of outlier detection aims at identifying such outliers in order to improve the analysis of
data and further discover interesting and useful knowledge about unusual events within numerous
applications domains. In this paper, we report on contemporary unsupervised outlier detection
techniques for multiple types of data sets and provide a comprehensive taxonomy framework and
two decision trees to select the most suitable technique based on data set. Furthermore, we
highlight the advantages, disadvantages and performance issues of each class of outlier detection
techniques under this taxonomy framework
Rate-Distortion Classification for Self-Tuning IoT Networks
Many future wireless sensor networks and the Internet of Things are expected
to follow a software defined paradigm, where protocol parameters and behaviors
will be dynamically tuned as a function of the signal statistics. New protocols
will be then injected as a software as certain events occur. For instance, new
data compressors could be (re)programmed on-the-fly as the monitored signal
type or its statistical properties change. We consider a lossy compression
scenario, where the application tolerates some distortion of the gathered
signal in return for improved energy efficiency. To reap the full benefits of
this paradigm, we discuss an automatic sensor profiling approach where the
signal class, and in particular the corresponding rate-distortion curve, is
automatically assessed using machine learning tools (namely, support vector
machines and neural networks). We show that this curve can be reliably
estimated on-the-fly through the computation of a small number (from ten to
twenty) of statistical features on time windows of a few hundreds samples
Mining sensor datasets with spatiotemporal neighborhoods
Many spatiotemporal data mining methods are dependent on how relationships between a spatiotemporal unit and its neighbors are defined. These relationships are often termed the neighborhood of a spatiotemporal object. The focus of this paper is the discovery of spatiotemporal neighborhoods to find automatically spatiotemporal sub-regions in a sensor dataset. This research is motivated by the need to characterize large sensor datasets like those found in oceanographic and meteorological research. The approach presented in this paper finds spatiotemporal neighborhoods in sensor datasets by combining an agglomerative method to create temporal intervals and a graph-based method to find spatial neighborhoods within each temporal interval. These methods were tested on real-world datasets including (a) sea surface temperature data from the Tropical Atmospheric Ocean Project (TAO) array in the Equatorial Pacific Ocean and (b) NEXRAD precipitation data from the Hydro-NEXRAD system. The results were evaluated based on known patterns of the phenomenon being measured. Furthermore the results were quantified by performing hypothesis testing to establish the statistical significance using Monte Carlo simulations. The approach was also compared with existing approaches using validation metrics namely spatial autocorrelation and temporal interval dissimilarity. The results of these experiments show that our approach indeed identifies highly refined spatiotemporal neighborhoods
- …