1,845 research outputs found
Outlier Mining Methods Based on Graph Structure Analysis
Outlier detection in high-dimensional datasets is a fundamental and challenging problem across disciplines that has also practical implications, as removing outliers from the training set improves the performance of machine learning algorithms. While many outlier mining algorithms have been proposed in the literature, they tend to be valid or efficient for specific types of datasets (time series, images, videos, etc.). Here we propose two methods that can be applied to generic datasets, as long as there is a meaningful measure of distance between pairs of elements of the dataset. Both methods start by defining a graph, where the nodes are the elements of the dataset, and the links have associated weights that are the distances between the nodes. Then, the first method assigns an outlier score based on the percolation (i.e., the fragmentation) of the graph. The second method uses the popular IsoMap non-linear dimensionality reduction algorithm, and assigns an outlier score by comparing the geodesic distances with the distances in the reduced space. We test these algorithms on real and synthetic datasets and show that they either outperform, or perform on par with other popular outlier detection methods. A main advantage of the percolation method is that is parameter free and therefore, it does not require any training; on the other hand, the IsoMap method has two integer number parameters, and when they are appropriately selected, the method performs similar to or better than all the other methods tested.Peer ReviewedPostprint (published version
STWalk: Learning Trajectory Representations in Temporal Graphs
Analyzing the temporal behavior of nodes in time-varying graphs is useful for
many applications such as targeted advertising, community evolution and outlier
detection. In this paper, we present a novel approach, STWalk, for learning
trajectory representations of nodes in temporal graphs. The proposed framework
makes use of structural properties of graphs at current and previous time-steps
to learn effective node trajectory representations. STWalk performs random
walks on a graph at a given time step (called space-walk) as well as on graphs
from past time-steps (called time-walk) to capture the spatio-temporal behavior
of nodes. We propose two variants of STWalk to learn trajectory
representations. In one algorithm, we perform space-walk and time-walk as part
of a single step. In the other variant, we perform space-walk and time-walk
separately and combine the learned representations to get the final trajectory
embedding. Extensive experiments on three real-world temporal graph datasets
validate the effectiveness of the learned representations when compared to
three baseline methods. We also show the goodness of the learned trajectory
embeddings for change point detection, as well as demonstrate that arithmetic
operations on these trajectory representations yield interesting and
interpretable results.Comment: 10 pages, 5 figures, 2 table
Spatial Data Quality in the IoT Era:Management and Exploitation
Within the rapidly expanding Internet of Things (IoT), growing amounts of spatially referenced data are being generated. Due to the dynamic, decentralized, and heterogeneous nature of the IoT, spatial IoT data (SID) quality has attracted considerable attention in academia and industry. How to invent and use technologies for managing spatial data quality and exploiting low-quality spatial data are key challenges in the IoT. In this tutorial, we highlight the SID consumption requirements in applications and offer an overview of spatial data quality in the IoT setting. In addition, we review pertinent technologies for quality management and low-quality data exploitation, and we identify trends and future directions for quality-aware SID management and utilization. The tutorial aims to not only help researchers and practitioners to better comprehend SID quality challenges and solutions, but also offer insights that may enable innovative research and applications
Towards Real-Time Detection and Tracking of Spatio-Temporal Features: Blob-Filaments in Fusion Plasma
A novel algorithm and implementation of real-time identification and tracking
of blob-filaments in fusion reactor data is presented. Similar spatio-temporal
features are important in many other applications, for example, ignition
kernels in combustion and tumor cells in a medical image. This work presents an
approach for extracting these features by dividing the overall task into three
steps: local identification of feature cells, grouping feature cells into
extended feature, and tracking movement of feature through overlapping in
space. Through our extensive work in parallelization, we demonstrate that this
approach can effectively make use of a large number of compute nodes to detect
and track blob-filaments in real time in fusion plasma. On a set of 30GB fusion
simulation data, we observed linear speedup on 1024 processes and completed
blob detection in less than three milliseconds using Edison, a Cray XC30 system
at NERSC.Comment: 14 pages, 40 figure
USING SPATIAL METHODS TO BETTER UNDERSTAND FOOD INSECURITY AND SNAP UNDER-PARTICIPATION IN TEXAS
The overall objective of this research is to use spatial methods to better understand food insecurity and SNAP under-participation in Texas. Paper 1 assesses whether a sample of community dwelling Medicare and Medicaid beneficiaries, who screen positive for food insecurity at healthcare locations in Harris County, exhibit a spatial pattern in terms of where they live. In other words, it tests whether or not there are statistically significant neighborhood hot spots or cold spots of food insecurity against a null hypothesis of complete spatial randomness. This approach is novel because it uses address-level data on patients who report being food insecure to test for statistically significant neighborhood hot spots or cold spots, instead of relying on extant factors like neighborhood poverty rates, or the presence of grocery stores. Using address-level food insecurity screening data is often difficult because few organizations screen for food insecurity, and even fewer are willing to share their data due to privacy concerns. Paper 2 utilizes geographical information systems (GIS) to map census tract-level clusters and outliers of households that are eligible but not enrolled (EBNE) in the SNAP program. The implications of this analysis are vast. Knowing the locations of neighborhood-level clusters and outliers of SNAP EBNE households can inform interventions to address the “SNAP GAP” more effectively. Additionally, this method of identifying neighborhood-level clusters and outliers of SNAP EBNE households can be applied to other safety net programs including Medicaid, the Children’s Health Insurance Program (CHIP), Healthy Texas Women, and the Women, Infant, and Children (WIC) Program
- …