5,174 research outputs found
Towards Real-Time Detection and Tracking of Spatio-Temporal Features: Blob-Filaments in Fusion Plasma
A novel algorithm and implementation of real-time identification and tracking
of blob-filaments in fusion reactor data is presented. Similar spatio-temporal
features are important in many other applications, for example, ignition
kernels in combustion and tumor cells in a medical image. This work presents an
approach for extracting these features by dividing the overall task into three
steps: local identification of feature cells, grouping feature cells into
extended feature, and tracking movement of feature through overlapping in
space. Through our extensive work in parallelization, we demonstrate that this
approach can effectively make use of a large number of compute nodes to detect
and track blob-filaments in real time in fusion plasma. On a set of 30GB fusion
simulation data, we observed linear speedup on 1024 processes and completed
blob detection in less than three milliseconds using Edison, a Cray XC30 system
at NERSC.Comment: 14 pages, 40 figure
Estimating Fire Weather Indices via Semantic Reasoning over Wireless Sensor Network Data Streams
Wildfires are frequent, devastating events in Australia that regularly cause
significant loss of life and widespread property damage. Fire weather indices
are a widely-adopted method for measuring fire danger and they play a
significant role in issuing bushfire warnings and in anticipating demand for
bushfire management resources. Existing systems that calculate fire weather
indices are limited due to low spatial and temporal resolution. Localized
wireless sensor networks, on the other hand, gather continuous sensor data
measuring variables such as air temperature, relative humidity, rainfall and
wind speed at high resolutions. However, using wireless sensor networks to
estimate fire weather indices is a challenge due to data quality issues, lack
of standard data formats and lack of agreement on thresholds and methods for
calculating fire weather indices. Within the scope of this paper, we propose a
standardized approach to calculating Fire Weather Indices (a.k.a. fire danger
ratings) and overcome a number of the challenges by applying Semantic Web
Technologies to the processing of data streams from a wireless sensor network
deployed in the Springbrook region of South East Queensland. This paper
describes the underlying ontologies, the semantic reasoning and the Semantic
Fire Weather Index (SFWI) system that we have developed to enable domain
experts to specify and adapt rules for calculating Fire Weather Indices. We
also describe the Web-based mapping interface that we have developed, that
enables users to improve their understanding of how fire weather indices vary
over time within a particular region.Finally, we discuss our evaluation results
that indicate that the proposed system outperforms state-of-the-art techniques
in terms of accuracy, precision and query performance.Comment: 20pages, 12 figure
Estimating Dependency, Monitoring and Knowledge Discovery in High-Dimensional Data Streams
Data Mining – known as the process of extracting knowledge from massive data sets – leads to phenomenal impacts on our society, and now affects nearly every aspect of our lives: from the layout in our local grocery store, to the ads and product recommendations we receive, the availability of treatments for common diseases, the prevention of crime, or the efficiency of industrial production processes.
However, Data Mining remains difficult when (1) data is high-dimensional, i.e., has many attributes, and when (2) data comes as a stream. Extracting knowledge from high-dimensional data streams is impractical because one must cope with two orthogonal sets of challenges. On the one hand, the effects of the so-called "curse of dimensionality" bog down the performance of statistical methods and yield to increasingly complex Data Mining problems. On the other hand, the statistical properties of data streams may evolve in unexpected ways, a phenomenon known in the community as "concept drift". Thus, one needs to update their knowledge about data over time, i.e., to monitor the stream.
While previous work addresses high-dimensional data sets and data streams to some extent, the intersection of both has received much less attention. Nevertheless, extracting knowledge in this setting is advantageous for many industrial applications: identifying patterns from high-dimensional data streams in real-time may lead to larger production volumes, or reduce operational costs. The goal of this dissertation is to bridge this gap.
We first focus on dependency estimation, a fundamental task of Data Mining. Typically, one estimates dependency by quantifying the strength of statistical relationships. We identify the requirements for dependency estimation in high-dimensional data streams and propose a new estimation framework, Monte Carlo Dependency Estimation (MCDE), that fulfils them all. We show that MCDE leads to efficient dependency monitoring.
Then, we generalise the task of monitoring by introducing the Scaling Multi-Armed Bandit (S-MAB) algorithms, extending the Multi-Armed Bandit (MAB) model. We show that our algorithms can efficiently monitor statistics by leveraging user-specific criteria.
Finally, we describe applications of our contributions to Knowledge Discovery. We propose an algorithm, Streaming Greedy Maximum Random Deviation (SGMRD), which exploits our new methods to extract patterns, e.g., outliers, in high-dimensional data streams. Also, we present a new approach, that we name kj-Nearest Neighbours (kj-NN), to detect outlying documents within massive text corpora.
We support our algorithmic contributions with theoretical guarantees, as well as extensive experiments against both synthetic and real-world data. We demonstrate the benefits of our methods against real-world use cases. Overall, this dissertation establishes fundamental tools for Knowledge Discovery in high-dimensional data streams, which help with many applications in the industry, e.g., anomaly detection, or predictive maintenance.
To facilitate the application of our results and future research, we publicly release our implementations, experiments, and benchmark data via open-source platforms
Outlier Detection from Network Data with Subnetwork Interpretation
Detecting a small number of outliers from a set of data observations is
always challenging. This problem is more difficult in the setting of multiple
network samples, where computing the anomalous degree of a network sample is
generally not sufficient. In fact, explaining why the network is exceptional,
expressed in the form of subnetwork, is also equally important. In this paper,
we develop a novel algorithm to address these two key problems. We treat each
network sample as a potential outlier and identify subnetworks that mostly
discriminate it from nearby regular samples. The algorithm is developed in the
framework of network regression combined with the constraints on both network
topology and L1-norm shrinkage to perform subnetwork discovery. Our method thus
goes beyond subspace/subgraph discovery and we show that it converges to a
global optimum. Evaluation on various real-world network datasets demonstrates
that our algorithm not only outperforms baselines in both network and high
dimensional setting, but also discovers highly relevant and interpretable local
subnetworks, further enhancing our understanding of anomalous networks
- …